github.com/kubewharf/katalyst-core@v0.5.3/docs/proposals/qos-management/eviction-manager/20220424-eviction-manager.md (about)

     1  ---
     2  title: Katalyst Reporter Manager
     3  authors:
     4    - "csfldf"
     5  reviewers:
     6    - "waynepeking348"
     7    - "luomingmeng"
     8    - "caohe"
     9  creation-date: 2022-04-24
    10  last-updated: 2023-02-23
    11  status: implemented
    12  ---
    13  
    14  <!-- toc -->
    15  
    16  ## Table of Contents
    17  
    18  - [Summary](#summary)
    19  - [Motivation](#motivation)
    20      - [Goals](#goals)
    21      - [Non-Goals](#non-goals)
    22  - [Proposal](#proposal)
    23      - [User Stories](#user-stories)
    24          - [Story 1](#story-1)
    25          - [Story 2](#story-2)
    26      - [Design Overview [Optional]](#design-overview-optional)
    27      - [API [Optional]](#api-optional)
    28      - [Design Details](#design-details)
    29  - [Alternatives](#alternatives)
    30  
    31  <!-- toc -->
    32  
    33  ## Summary
    34  
    35  Eviction is usually used as a common back-and-force method, in case that QoS requirements fail to be satisfied. The eviction manager will work as a general framework, and is the only entrance for eviction logic. Different vendors can implement their own eviction strategy based on customized scenario, and the eviction manager will gather those info from all plugins, analyze by sorting and filtering algorithms, and then trigger eviction requests.
    36  
    37  ## Motivation
    38  
    39  
    40  ### Goals
    41  
    42  - Make it easier for vendors or administrators to implement customized eviction strategies.
    43  - Implement a common framework to converge eviction info from multiple eviction plugins.
    44  
    45  ### Non-Goals
    46  
    47  - Replace the original implementation of eviction manager in kubelet.
    48  - Implement a fully functional eviction strategy to cover all scenarios.
    49  
    50  ## Proposal
    51  
    52  ### User Story
    53  
    54  For a production environment containing pods with multiple QoS levels, there may exist different resources or device vendors, and they usually focus on their customized scenarios. For instance, disk vendors mainly keep an eye on whether contention happens at disk-level, such as iops for disk is beyond threshold, and nic vendors usually care about contention at network interface or protocol stack.
    55  
    56  Compared with kubelet eviction manager static threshold strategy, katalyst eviction manager provides more flexible eviction interfaces. Vendors or administrators just need to focus on implementing customized eviction strategies in plugins for pressure detection and picking eviction candidates. There is no need for them to perform eviction requests or mark pressure conditions in Node or CNR, katalyst eviction manager will do it as a coordinator. Without a coordinator, each plugin may choose pods from its perspective, thus evicting too many pods. Katalyst eviction manager will analyze candidates from all plugins by sorting and filtering algorithms and perform eviction requests under control of throttle algorithm, thus reducing the disturbance.
    57  
    58  ### Design Overview
    59  <div align="center">
    60    <picture>
    61      <img src="/docs/imgs/eviction-manager-overview.png" width=80% title="Katalyst Overview" loading="eager" />
    62    </picture>
    63  </div>
    64  
    65  For architecture overview, the system mainly contains two modules: eviction manager and eviction plugin.
    66  - Eviction Manager is a coordinator, and communicates with multiple Eviction Plugins. It receives pressure conditions and eviction candidates from each plugin, and makes the final eviction decision based on sorting and filtering algorithms.
    67  - Eviction Plugins are implemented according to each individual vendor or scenario. Each plugin will only output eviction candidates or resource pressure status based on its own knowledge,  and report those info to Eviction Manager periodically.
    68  
    69  ### API [Optional]
    70  
    71  Eviction Plugin communicates with Eviction Manager with GPRC, and the protobuf is shown as below.
    72  ```
    73  type ThresholdMetType int
    74  
    75  const (
    76      NotMet ThresholdMetType = iota 
    77      SoftMet
    78      HardMet 
    79  )
    80  
    81  type ConditionType int
    82  
    83  const (
    84      NodeCondition = iota
    85      CNRCondition
    86  )
    87  
    88  type Condition struct {
    89      ConditionType ConditionType
    90      ConditionName string
    91      MetCondition  bool
    92  }
    93  
    94  type ThresholdMetResponse struct {
    95      ThresholdValue    float64
    96      ObservedValue     float64
    97      ThresholdOperator string
    98      MetType           ThresholdMetType
    99      EvictionScode     string         // eg. resource name
   100      Condition         Condition
   101  }
   102  
   103  type GetTopEvictionPodsRequest struct {
   104      ActivePods    []*v1.Pod
   105      topN          uint64
   106  }
   107  
   108  type GetTopEvictionPodsResponse struct {
   109      TargetPods []*v1.Pod // length is less than or equal to topN in GetTopEvictionPodsRequest
   110      GracePeriodSeconds   uint64
   111  }
   112  
   113  type EvictPods struct {
   114      Pod              *v1.Pod
   115      Reason           string
   116      GracePeriod      time.Duration
   117      ForceEvict       bool
   118  }
   119  
   120  type GetEvictPodsResponse struct {
   121      EvictPods []*EvictPod
   122      Condition        Condition
   123  }
   124  
   125  func ThresholdMet(Empty) (ThresholdMetResponse, error)
   126  func GetTopEvictionPods(GetTopEvictionPodsRequest) (GetTopEvictionPodsResponse, error) 
   127  func GetEvictPods(Empty) (GetEvictPodsResponse, error)
   128  ```
   129  
   130  Based on the API, the workflow is as below.
   131  - Eviction Manager periodically calls the ThresholdMet function of each Eviction Plugin through endpoint to get pressure condition status, and filters out the returned values if NotMet. After comparing the smoothed pressure contention with the target threshold, the manager will update pressure conditions both for Node and CNR. If hard threshold is met, eviction manager calls GetTopEvictionPods function of corresponding plugin to get eviction candidates.
   132  - Eviction Manager also periodically calls the GetEvictPods function of each Eviction Plugin to get eviction candidates explicitly. Those candidates include forced ones and soft ones, and the former means the manager should trigger eviction immediately, while the latter means manager should choose a selected set of pods to evict.
   133  - Eviction Manager will then aggregate all candidates, perform filtering, sorting, and rate-limiting logic, and finally send eviction requests for all selected pods.
   134  
   135  ### Design Details
   136  
   137  In this part, we will introduce the detailed responsibility for Eviction Manager, along with embedded eviction plugins in katalyst.
   138  
   139  #### Eviction Manager
   140  
   141  - Plugin Manager is responsible for the registration process, and constructs the endpoint for each plugin.
   142  - Plugin Endpoints maintain the endpoint info for each plugin, including client, descriptions and so on.
   143  - Launcher is the core calculation module. It will communicate with each plugin through GRPC periodically, and perform eviction strategies to exact those pods for eviction.
   144  - Evictor is the utility module to communicate with APIServer. When candidates are finally confirmed, the Launcher will call Evictor to trigger eviction.
   145  - Condition Reporter is used to update pressure conditions for Node and CNR to prevent more pods from scheduling into the same node if resource pressure already exists.
   146  
   147  #### Plugins
   148  
   149  - Inner eviction plugins only depend on the raw metrics/data, and are implemented and deployed along with eviction manager. For instance, cpu suppression eviction plugin and reclaimed resource over-commit eviction plugin both belong to this type.
   150  - Outer eviction plugins depend on the calculation results of other modules. For instance, load eviction plugin depends on the allocation results of QRM, and memory bandwidth eviction depends on the suppression strategy in SysAdvisor, so these eviction plugins should be implemented out-of-tree.
   151  
   152  ## Alternatives
   153  
   154  - Implement pod eviction in native kubelet eviction manager, but this invades too much into the source codes. Besides,  we must also implement the metrics collecting and analyzing logic in kubelet, and this means we must be bound with a specific metric source. Finally, upgrading the kubelet is too heavy compared with daemonset,  so if we need to add a new or adjust an existing eviction strategy frequently, it is not convenient.
   155  - Implement pod eviction in each plugin without a centralized coordinator. Usually, when resource contention happens, it may cause thundering herds, meaning that more than one plugin decides to trigger pod eviction. And this problem can not be solved if there is no coordinator.
   156  
   157