github.com/kubewharf/katalyst-core@v0.5.3/docs/concepts.md

github.com/kubewharf/katalyst-core@v0.5.3/docs/concepts.md (about)

     1  # Concepts - katalyst core concepts
     2  Katalyst contains a lot of components, making it difficult to dive deep. This documentation will introduce the basic concepts of katalyst to help developers understand how the system works, how it abstracts the QoS model, and how you can dynamically configure the system.
     3  
     4  ## Architecture
     5  As shown in the architecture below, katalyst mainly contains three layers. For user-side API, katalyst defines a suit of QoS model along with multiple enhancements to match up with QoS requirements for different kinds of workload. Users can deploy their workload with different QoS requirements, and katalyst daemon will try to allocate proper resources and devices for those pods to satisfy their QoS requirements. This allocation process will work both at pod admission phase and runtime, taking into consideration the resource usage and QoS class of pods running on the same node. Besides, centralized components will cooperate with daemons to provide better resource adjustments for each workload with a cluster-level perspective.
     6  <div align="center">
     7    <picture>
     8      <img src="./imgs/katalyst-overview.jpg" width=80% title="Katalyst Overview" loading="eager" />
     9    </picture>
    10  </div>
    11  
    12  ## Components
    13  Katalyst contains centralized components that are deployed as deployments, and agents that run as deamonsets on each and every node.
    14  
    15  ### Centralized Components
    16  
    17  #### Katalyst Controllers/Webhooks
    18  
    19  Katalyst controllers provide cluster-level abilities, including service profiling, elastic resource recommendation, core Custom Resource lifecycle management, and centralized eviction strategies run as a backstop. Katalyst webhooks are responsible for validating QoS configurations, and mutating resource requests according to service profiling.
    20  
    21  #### Katalyst Scheduler
    22  
    23  Katalyst scheduler is developed based on the scheduler v2 framework to provide the scheduling functionality for hybrid deployment and topology-aware scheduling scenarios
    24  
    25  #### Custom Metrics API
    26  
    27  Custom metrics API implements the standard custom-metrics-apiserver interface, and is responsible for collecting, storing, and inquiring metrics. It is mainly used by elastic resource recommendation and re-scheduling in the katalyst system.
    28  
    29  ### Daemon Components
    30  
    31  #### QoS Resource Manager
    32  
    33  QoS Resource Manager (QRM for short) is designed as an extended framework in kubelet, and it works as a new hint provider similar to Device Manager. But unlike Device Manager, QRM aims at allocating nondiscrete resources (i.e. cpu/memory) rather than discrete devices, and it can adjust allocation results dynamically and periodically based on container running status. QRM is implemented in kubewahrf enhanced kubernetes, and if you want to get more information about QRM, please refer to [qos-resource-manager](./proposals/qos-management/qos-resource-manager/20221018-qos-resource-manager.md).
    34  
    35  #### Katalyst agent
    36  
    37  Katalyst Agent is designed as the core daemon component to implement resource management according to QoS requirements and container running status. Katalyst agent contains several individual modules that are responsible for different functionalities. These modules can either be deployed as a monolithic container or separate ones.
    38  - Eviction Manager is a framework for eviction strategies. Users can implement their own eviction plugins to handle contention for each resource type. For more information about eviction manager, please refer to [eviction-manager](./proposals/qos-management/eviction-manager/20220424-eviction-manager.md).
    39  - Resource Reporter is a framework for different CRDs or different fields in the same CRD. For instance, different fields in CNR may be collected through different sources, and this framework makes it possible for users to implement each resource reporter with a plugin. For more information about reporter manager, please refer to [reporter-manager](./proposals/qos-management/reporter-manager/20220515-reporter-manager.md).
    40  - SysAdvisor is the core node-level resource recommendation module, and it uses statistical-based, indicator-based, and ml-based algorithms for different scenarios. For more information about sysadvisor, please refer to [sys-advisor](proposals/qos-management/wip-20220615-sys-advisor.md).
    41  - QRM Plugin works as a plugin for each resource with static or dynamic policies. Generally, QRM Plugins receive resource recommendations from SysAdvisor, and export controlling configs through CRI interface embedded in QRM Framework.
    42  
    43  #### Malachite
    44  
    45  Malachite is a unified metrics-collecting component. It is implemented out-of-tree, and serves node, numa, pod and container level metrics through an http endpoint from which katalyst will query real-time metrics data. In a real-world production environment, you can replace malachite with your own metric implementations.
    46  
    47  ## QoS
    48  
    49  To extend the ability of kubernetes' original QoS level, katalyst defines its own QoS level with CPU as the dominant resource. Other than memory, CPU is considered as a divisible resource and is easier to isolate. And for cloudnative workloads, CPU is usually the dominant resource that causes performance problems. So katalyst uses CPU to name different QoS classes, and other resources are implicitly accompanied by it.
    50  
    51  ### Definition
    52  <br>
    53  <table>
    54    <tbody>
    55      <tr>
    56        <th align="center">Qos level</th>
    57        <th align="center">Feature</th>
    58        <th align="center">Target Workload</th>
    59        <th align="center">Mapped k8s QoS</th>
    60      </tr>
    61      <tr>
    62        <td>dedicated_cores</td>
    63        <td>
    64          <ul>
    65            <li>Bind with a quantity of dedicated cpu cores</li>
    66            <li>Without sharing with any other pod</li>
    67          </ul>
    68        </td>
    69        <td>
    70          <ul>
    71            <li>Workload that's very sensitive to latency</li>
    72            <li>such as online advertising, recommendation.</li>
    73          </ul>
    74        </td>
    75        <td>Guaranteed</td>
    76      </tr>
    77      <tr>
    78        <td>shared_cores</td>
    79        <td>
    80          <ul>
    81            <li>Share a set of dedicated cpu cores with other shared_cores pods</li>
    82          </ul>
    83        </td>
    84        <td>
    85          <ul>
    86            <li>Workload that can tolerate a little cpu throttle or neighbor spikes</li>
    87            <li>such as microservices for webservice.</li>
    88          </ul>
    89        </td>
    90        <td>Guaranteed/Burstable</td>
    91      </tr>
    92      <tr>
    93        <td>reclaimed_cores</td>
    94        <td>
    95          <ul>
    96            <li>Over-committed resources that are squeezed from dedicated_cores or shared_cores</li>
    97            <li>Whenever dedicated_cores or shared_cores need to claim their resources back, reclaimed_cores will be suppressed or evicted</li>
    98          </ul>
    99        </td>
   100        <td>
   101          <ul>
   102            <li>Workload that mainly cares about throughput rather than latency</li>
   103            <li>such as batch bigdata, offline training.</li>
   104          </ul>
   105        </td>
   106        <td>BestEffort</td>
   107      </tr>
   108      <tr>
   109        <td>system_cores</td>
   110        <td>
   111          <ul>
   112            <li>Reserved for core system agents to ensure performance</li>
   113          </ul>
   114        </td>
   115        <td>
   116          <ul>
   117            <li>Core system agents.</li>
   118          </ul>
   119        </td>
   120        <td>Burstable</td>
   121      </tr>
   122    </tbody>
   123  </table>
   124  
   125  #### Pool
   126  
   127  As introduced above, katalyst uses the term `pool` to indicate a combination of resources that a batch of pods share with each other. For instance, pods with shared_cores may share a shared pool, meaning that they share the same cpusets, memory limits and so on; in the meantime, if `cpuset_pool` enhancement is enabled, the single shared pool will be separated into several pools based on the configurations.
   128  
   129  ### Enhancement
   130  
   131  Beside the core QoS level,  katalyst also provides a mechanism to enhance the ability of standard QoS levels. The enhancement works as a flexible extensibility, and may be added continuously.
   132  
   133  <br>
   134  <table>
   135    <tbody>
   136      <tr>
   137        <th align="center">Enhancement</th>
   138        <th align="center">Feature</th>
   139      </tr>
   140      <tr>
   141        <td>numa_binding</td>
   142        <td>
   143          <ul>
   144            <li>Indicates that the pod should be bound into a (or several) numa node(s) to gain further performance improvements</li>
   145            <li>Only supported by dedicated_cores</li>
   146          </ul>
   147        </td>
   148      </tr>
   149      <tr>
   150        <td>cpuset_pool</td>
   151        <td>
   152          <ul>
   153            <li>Allocate a separated cpuset in shared_cores pool to isolate scheduling domain for identical pods.</li>
   154            <li>Only supported by shared_cores</li>
   155          </ul>
   156        </td>
   157      </tr>
   158      <tr>
   159        <td>...</td>
   160        <td>
   161        </td>
   162      </tr>
   163    </tbody>
   164  </table>
   165  
   166  ## Configurations
   167  
   168  To make the configuration more flexible, katalyst designs a new mechanism to set configs on the run, and it works as a supplement for static configs defined via command-line flags. In katalyst, the implementation of this mechanism is called `KatalystCustomConfig` (`KCC` for short). It enables each daemon component to dynamically adjust its working status without restarting or re-deploying.
   169  For more information about KCC, please refer to [dynamic-configuration](proposals/qos-management/wip-20220706-dynamic-configuration.md).