k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/oom_detection.md (about)

     1  # Clusterloader2 - tracking OOMs
     2  
     3  In order to track pods OOMs (out of memory) during clusterloader2 tests
     4  execution, TestMetrics has been augmented with `ClusterOOMsTracker` measurement.
     5  This outputs an appropriate summary containing basic information regarding OOMs
     6  inside the cluster like the PID of OOMing process or the name of a node where
     7  the OOM happened.
     8  
     9  Given that `ClusterOOMsTracker` is based on Kubernetes events and events are
    10  best effort by their nature, the reported summary is not guaranteed to
    11  accurately describe what really happened - some of the OOMs may be missed.
    12  
    13  In Kubemark tests cluster nodes are hollow and `node-problem-detector` is
    14  faked, so this will only track OOMs occuring in master components.
    15  
    16  ## Enabling OOMs tracking
    17  
    18  Firstly, ensure that the `TestMetrics` measurement is added to `Starting
    19  measurements` and `Collecting measurements` steps of your test config. Next,
    20  add `clusterOOMsTrackerEnabled` parameter and set it to `true` in both steps
    21  configuration.
    22  
    23  Sample configuration in `Starting measurements` step:
    24  
    25  ```yaml
    26  - name: Starting measurements
    27    measurements:
    28      ...
    29      - Identifier: TestMetrics
    30        Method: TestMetrics
    31        Params:
    32          action: start
    33          clusterOOMsTrackerEnabled: true
    34          clusterOOMsIgnoredProcesses: ""
    35  ```
    36  
    37  Sample configuration in `Collecting measurements` step:
    38  
    39  ```yaml
    40  - name: Collecting measurements
    41    measurements:
    42      ...
    43      - Identifier: TestMetrics
    44        Method: TestMetrics
    45        Params:
    46          action: gather
    47          clusterOOMsTrackerEnabled: true
    48  ```
    49  
    50  In order to prevent certain OOMs from failing a clusterloader2 test, one can
    51  ignore certain processes reported by the `node-problem-detector`. To do so,
    52  set the value of `clusterOOMsIgnoredProcesses` TestMetrics parameter to a
    53  sequence of comma-separated processes names. The OOMs from the mentioned
    54  processes will still be included in the measurement summary.
    55  
    56  ## Further debugging steps
    57  
    58  `ClusterOOMsTracker` watches for events emitted by `node-problem-detector` when
    59  an OOM occurs. Such events contain only a fraction of information that may be
    60  useful for debugging - for more, check `systemd.log` files of an appropriate
    61  node for the name of OOMing pod/container or the container's memory limit.