github.com/argoproj/argo-cd/v3@v3.2.1/docs/operator-manual/health.md (about) 1 # Resource Health 2 3 ## Overview 4 Argo CD provides built-in health assessment for several standard Kubernetes types, which is then 5 surfaced to the overall Application health status as a whole. The following checks are made for 6 specific types of Kubernetes resources: 7 8 ### Deployment, ReplicaSet, StatefulSet, DaemonSet 9 * Observed generation is equal to desired generation. 10 * Number of **updated** replicas equals the number of desired replicas. 11 12 ### Service 13 * If service type is of type `LoadBalancer`, the `status.loadBalancer.ingress` list is non-empty, 14 with at least one value for `hostname` or `IP`. 15 16 ### Ingress 17 * The `status.loadBalancer.ingress` list is non-empty, with at least one value for `hostname` or `IP`. 18 19 ### Job 20 * If job `.spec.suspended` is set to 'true', then the job and app health will be marked as suspended. 21 ### PersistentVolumeClaim 22 * The `status.phase` is `Bound` 23 24 ### Argocd App 25 26 The health assessment of `argoproj.io/Application` CRD has been removed in argocd 1.8 (see [#3781](https://github.com/argoproj/argo-cd/issues/3781) for more information). 27 You might need to restore it if you are using app-of-apps pattern and orchestrating synchronization using sync waves. Add the following resource customization in 28 `argocd-cm` ConfigMap: 29 30 ```yaml 31 --- 32 apiVersion: v1 33 kind: ConfigMap 34 metadata: 35 name: argocd-cm 36 namespace: argocd 37 labels: 38 app.kubernetes.io/name: argocd-cm 39 app.kubernetes.io/part-of: argocd 40 data: 41 resource.customizations.health.argoproj.io_Application: | 42 hs = {} 43 hs.status = "Progressing" 44 hs.message = "" 45 if obj.status ~= nil then 46 if obj.status.health ~= nil then 47 hs.status = obj.status.health.status 48 if obj.status.health.message ~= nil then 49 hs.message = obj.status.health.message 50 end 51 end 52 end 53 return hs 54 ``` 55 56 ## Custom Health Checks 57 58 Argo CD supports custom health checks written in [Lua](https://www.lua.org/). This is useful if you: 59 60 * Are affected by known issues where your `Ingress` or `StatefulSet` resources are stuck in `Progressing` state because of bug in your resource controller. 61 * Have a custom resource for which Argo CD does not have a built-in health check. 62 63 There are two ways to configure a custom health check. The next two sections describe those ways. 64 65 ### Way 1. Define a Custom Health Check in `argocd-cm` ConfigMap 66 67 Custom health checks can be defined in 68 ```yaml 69 resource.customizations.health.<group>_<kind>: | 70 ``` 71 field of `argocd-cm`. If you are using argocd-operator, this is overridden by [the argocd-operator resourceCustomizations](https://argocd-operator.readthedocs.io/en/latest/reference/argocd/#resource-customizations). 72 73 The following example demonstrates a health check for `cert-manager.io/Certificate`. 74 75 ```yaml 76 data: 77 resource.customizations.health.cert-manager.io_Certificate: | 78 hs = {} 79 if obj.status ~= nil then 80 if obj.status.conditions ~= nil then 81 for i, condition in ipairs(obj.status.conditions) do 82 if condition.type == "Ready" and condition.status == "False" then 83 hs.status = "Degraded" 84 hs.message = condition.message 85 return hs 86 end 87 if condition.type == "Ready" and condition.status == "True" then 88 hs.status = "Healthy" 89 hs.message = condition.message 90 return hs 91 end 92 end 93 end 94 end 95 96 hs.status = "Progressing" 97 hs.message = "Waiting for certificate" 98 return hs 99 ``` 100 101 In order to prevent duplication of custom health checks for potentially multiple resources, it is also possible to 102 specify a wildcard in the resource kind, and anywhere in the resource group, like this: 103 104 ```yaml 105 resource.customizations: | 106 ec2.aws.crossplane.io/*: 107 health.lua: | 108 ... 109 ``` 110 111 ```yaml 112 # If a key _begins_ with a wildcard, please ensure that the GVK key is quoted. 113 resource.customizations: | 114 "*.aws.crossplane.io/*": 115 health.lua: | 116 ... 117 ``` 118 119 !!!important 120 Please, note that wildcards are only supported when using the `resource.customizations` key, the `resource.customizations.health.<group>_<kind>` 121 style keys do not work since wildcards (`*`) are not supported in Kubernetes configmap keys. 122 123 The `obj` is a global variable which contains the resource. The script must return an object with status and optional message field. 124 The custom health check might return one of the following health statuses: 125 126 * `Healthy` - the resource is healthy 127 * `Progressing` - the resource is not healthy yet but still making progress and might be healthy soon 128 * `Degraded` - the resource is degraded 129 * `Suspended` - the resource is suspended and waiting for some external event to resume (e.g. suspended CronJob or paused Deployment) 130 131 By default, health typically returns a `Progressing` status. 132 133 NOTE: As a security measure, access to the standard Lua libraries will be disabled by default. Admins can control access by 134 setting `resource.customizations.useOpenLibs.<group>_<kind>`. In the following example, standard libraries are enabled for health check of `cert-manager.io/Certificate`. 135 136 ```yaml 137 data: 138 resource.customizations.useOpenLibs.cert-manager.io_Certificate: true 139 resource.customizations.health.cert-manager.io_Certificate: | 140 # Lua standard libraries are enabled for this script 141 ``` 142 143 ### Way 2. Contribute a Custom Health Check 144 145 A health check can be bundled into Argo CD. Custom health check scripts are located in the `resource_customizations` directory of [https://github.com/argoproj/argo-cd](https://github.com/argoproj/argo-cd). This must have the following directory structure: 146 147 ``` 148 argo-cd 149 |-- resource_customizations 150 | |-- your.crd.group.io # CRD group 151 | | |-- MyKind # Resource kind 152 | | | |-- health.lua # Health check 153 | | | |-- health_test.yaml # Test inputs and expected results 154 | | | +-- testdata # Directory with test resource YAML definitions 155 ``` 156 157 Each health check must have tests defined in `health_test.yaml` file. The `health_test.yaml` is a YAML file with the following structure: 158 159 ```yaml 160 tests: 161 - healthStatus: 162 status: ExpectedStatus 163 message: Expected message 164 inputPath: testdata/test-resource-definition.yaml 165 ``` 166 167 To test the implemented custom health checks, run `go test -v ./util/lua/`. 168 169 The [PR#1139](https://github.com/argoproj/argo-cd/pull/1139) is an example of Cert Manager CRDs custom health check. 170 171 #### Wildcard Support for Built-in Health Checks 172 173 You can use a single health check for multiple resources by using a wildcard in the group or kind directory names. 174 175 The `_` character behaves like a `*` wildcard. For example, consider the following directory structure: 176 177 ``` 178 argo-cd 179 |-- resource_customizations 180 | |-- _.group.io # CRD group 181 | | |-- _ # Resource kind 182 | | | |-- health.lua # Health check 183 ``` 184 185 Any resource with a group that ends with `.group.io` will use the health check in `health.lua`. 186 187 Wildcard checks are only evaluated if there is no specific check for the resource. 188 189 If multiple wildcard checks match, the first one in the directory structure is used. 190 191 We use the [doublestar](https://github.com/bmatcuk/doublestar) glob library to match the wildcard checks. We currently 192 only treat a path as a wildcard if it contains a `_` character, but this may change in the future. 193 194 !!!important "Avoid Massive Scripts" 195 196 Avoid writing massive scripts to handle multiple resources. They'll get hard to read and maintain. Instead, just 197 duplicate the relevant parts in resource-specific scripts. 198 199 ## Overriding Go-Based Health Checks 200 201 Health checks for some resources were [hardcoded as Go code](https://github.com/argoproj/gitops-engine/tree/master/pkg/health) 202 because Lua support was introduced later. Also, the logic of health checks for some resources were too complex, so it 203 was easier to implement it in Go. 204 205 It is possible to override health checks for built-in resource. Argo will prefer the configured health check over the 206 Go-based built-in check. 207 208 The following resources have Go-based health checks: 209 210 * PersistentVolumeClaim 211 * Pod 212 * Service 213 * apiregistration.k8s.io/APIService 214 * apps/DaemonSet 215 * apps/Deployment 216 * apps/ReplicaSet 217 * apps/StatefulSet 218 * argoproj.io/Workflow 219 * autoscaling/HorizontalPodAutoscaler 220 * batch/Job 221 * extensions/Ingress 222 * networking.k8s.io/Ingress 223 224 ## Health Checks 225 226 An Argo CD App's health is inferred from the health of its immediate child resources (the resources represented in 227 source control). The App health will be the worst health of its immediate child sources. The priority of most to least 228 healthy statuses is: `Healthy`, `Suspended`, `Progressing`, `Missing`, `Degraded`, `Unknown`. So, for example, if an App 229 has a `Missing` resource and a `Degraded` resource, the App's health will be `Missing`. 230 231 But the health of a resource is not inherited from child resources - it is calculated using only information about the 232 resource itself. A resource's status field may or may not contain information about the health of a child resource, and 233 the resource's health check may or may not take that information into account. 234 235 The lack of inheritance is by design. A resource's health can't be inferred from its children because the health of a 236 child resource may not be relevant to the health of the parent resource. For example, a Deployment's health is not 237 necessarily affected by the health of its Pods. 238 239 ``` 240 App (healthy) 241 └── Deployment (healthy) 242 └── ReplicaSet (healthy) 243 └── Pod (healthy) 244 └── ReplicaSet (unhealthy) 245 └── Pod (unhealthy) 246 ``` 247 248 If you want the health of a child resource to affect the health of its parent, you need to configure the parent's health 249 check to take the child's health into account. Since only the parent resource's state is available to the health check, 250 the parent resource's controller needs to make the child resource's health available in the parent resource's status 251 field. 252 253 ``` 254 App (healthy) 255 └── CustomResource (healthy) <- This resource's health check needs to be fixed to mark the App as unhealthy 256 └── CustomChildResource (unhealthy) 257 ``` 258 ## Ignoring Child Resource Health Check in Applications 259 260 To ignore the health check of an immediate child resource within an Application, set the annotation `argocd.argoproj.io/ignore-healthcheck` to `true`. For example: 261 262 ```yaml 263 apiVersion: apps/v1 264 kind: Deployment 265 metadata: 266 annotations: 267 argocd.argoproj.io/ignore-healthcheck: "true" 268 ``` 269 270 By doing this, the health status of the Deployment will not affect the health of its parent Application.