k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/oom_detection.md (about) 1 # Clusterloader2 - tracking OOMs 2 3 In order to track pods OOMs (out of memory) during clusterloader2 tests 4 execution, TestMetrics has been augmented with `ClusterOOMsTracker` measurement. 5 This outputs an appropriate summary containing basic information regarding OOMs 6 inside the cluster like the PID of OOMing process or the name of a node where 7 the OOM happened. 8 9 Given that `ClusterOOMsTracker` is based on Kubernetes events and events are 10 best effort by their nature, the reported summary is not guaranteed to 11 accurately describe what really happened - some of the OOMs may be missed. 12 13 In Kubemark tests cluster nodes are hollow and `node-problem-detector` is 14 faked, so this will only track OOMs occuring in master components. 15 16 ## Enabling OOMs tracking 17 18 Firstly, ensure that the `TestMetrics` measurement is added to `Starting 19 measurements` and `Collecting measurements` steps of your test config. Next, 20 add `clusterOOMsTrackerEnabled` parameter and set it to `true` in both steps 21 configuration. 22 23 Sample configuration in `Starting measurements` step: 24 25 ```yaml 26 - name: Starting measurements 27 measurements: 28 ... 29 - Identifier: TestMetrics 30 Method: TestMetrics 31 Params: 32 action: start 33 clusterOOMsTrackerEnabled: true 34 clusterOOMsIgnoredProcesses: "" 35 ``` 36 37 Sample configuration in `Collecting measurements` step: 38 39 ```yaml 40 - name: Collecting measurements 41 measurements: 42 ... 43 - Identifier: TestMetrics 44 Method: TestMetrics 45 Params: 46 action: gather 47 clusterOOMsTrackerEnabled: true 48 ``` 49 50 In order to prevent certain OOMs from failing a clusterloader2 test, one can 51 ignore certain processes reported by the `node-problem-detector`. To do so, 52 set the value of `clusterOOMsIgnoredProcesses` TestMetrics parameter to a 53 sequence of comma-separated processes names. The OOMs from the mentioned 54 processes will still be included in the measurement summary. 55 56 ## Further debugging steps 57 58 `ClusterOOMsTracker` watches for events emitted by `node-problem-detector` when 59 an OOM occurs. Such events contain only a fraction of information that may be 60 useful for debugging - for more, check `systemd.log` files of an appropriate 61 node for the name of OOMing pod/container or the container's memory limit.