github.com/argoproj/argo-cd/v3@v3.2.1/docs/operator-manual/health.md (about)

     1  # Resource Health
     2  
     3  ## Overview
     4  Argo CD provides built-in health assessment for several standard Kubernetes types, which is then
     5  surfaced to the overall Application health status as a whole. The following checks are made for
     6  specific types of Kubernetes resources:
     7  
     8  ### Deployment, ReplicaSet, StatefulSet, DaemonSet
     9  * Observed generation is equal to desired generation.
    10  * Number of **updated** replicas equals the number of desired replicas.
    11  
    12  ### Service
    13  * If service type is of type `LoadBalancer`, the `status.loadBalancer.ingress` list is non-empty,
    14  with at least one value for `hostname` or `IP`.
    15  
    16  ### Ingress
    17  * The `status.loadBalancer.ingress` list is non-empty, with at least one value for `hostname` or `IP`.
    18  
    19  ### Job
    20  * If job `.spec.suspended` is set to 'true', then the job and app health will be marked as suspended.
    21  ### PersistentVolumeClaim
    22  * The `status.phase` is `Bound`
    23  
    24  ### Argocd App
    25  
    26  The health assessment of `argoproj.io/Application` CRD has been removed in argocd 1.8 (see [#3781](https://github.com/argoproj/argo-cd/issues/3781) for more information).
    27  You might need to restore it if you are using app-of-apps pattern and orchestrating synchronization using sync waves. Add the following resource customization in
    28  `argocd-cm` ConfigMap:
    29  
    30  ```yaml
    31  ---
    32  apiVersion: v1
    33  kind: ConfigMap
    34  metadata:
    35    name: argocd-cm
    36    namespace: argocd
    37    labels:
    38      app.kubernetes.io/name: argocd-cm
    39      app.kubernetes.io/part-of: argocd
    40  data:
    41    resource.customizations.health.argoproj.io_Application: |
    42      hs = {}
    43      hs.status = "Progressing"
    44      hs.message = ""
    45      if obj.status ~= nil then
    46        if obj.status.health ~= nil then
    47          hs.status = obj.status.health.status
    48          if obj.status.health.message ~= nil then
    49            hs.message = obj.status.health.message
    50          end
    51        end
    52      end
    53      return hs
    54  ```
    55  
    56  ## Custom Health Checks
    57  
    58  Argo CD supports custom health checks written in [Lua](https://www.lua.org/). This is useful if you:
    59  
    60  * Are affected by known issues where your `Ingress` or `StatefulSet` resources are stuck in `Progressing` state because of bug in your resource controller.
    61  * Have a custom resource for which Argo CD does not have a built-in health check.
    62  
    63  There are two ways to configure a custom health check. The next two sections describe those ways.
    64  
    65  ### Way 1. Define a Custom Health Check in `argocd-cm` ConfigMap
    66  
    67  Custom health checks can be defined in
    68  ```yaml
    69    resource.customizations.health.<group>_<kind>: |
    70  ```
    71  field of `argocd-cm`. If you are using argocd-operator, this is overridden by [the argocd-operator resourceCustomizations](https://argocd-operator.readthedocs.io/en/latest/reference/argocd/#resource-customizations).
    72  
    73  The following example demonstrates a health check for `cert-manager.io/Certificate`.
    74  
    75  ```yaml
    76  data:
    77    resource.customizations.health.cert-manager.io_Certificate: |
    78      hs = {}
    79      if obj.status ~= nil then
    80        if obj.status.conditions ~= nil then
    81          for i, condition in ipairs(obj.status.conditions) do
    82            if condition.type == "Ready" and condition.status == "False" then
    83              hs.status = "Degraded"
    84              hs.message = condition.message
    85              return hs
    86            end
    87            if condition.type == "Ready" and condition.status == "True" then
    88              hs.status = "Healthy"
    89              hs.message = condition.message
    90              return hs
    91            end
    92          end
    93        end
    94      end
    95  
    96      hs.status = "Progressing"
    97      hs.message = "Waiting for certificate"
    98      return hs
    99  ```
   100  
   101  In order to prevent duplication of custom health checks for potentially multiple resources, it is also possible to
   102  specify a wildcard in the resource kind, and anywhere in the resource group, like this:
   103  
   104  ```yaml
   105    resource.customizations: |
   106      ec2.aws.crossplane.io/*:
   107        health.lua: |
   108          ...
   109  ```
   110  
   111  ```yaml
   112    # If a key _begins_ with a wildcard, please ensure that the GVK key is quoted.
   113    resource.customizations: |
   114      "*.aws.crossplane.io/*":
   115        health.lua: |
   116          ...
   117  ```
   118  
   119  !!!important
   120      Please, note that wildcards are only supported when using the `resource.customizations` key, the `resource.customizations.health.<group>_<kind>`
   121      style keys do not work since wildcards (`*`) are not supported in Kubernetes configmap keys.
   122  
   123  The `obj` is a global variable which contains the resource. The script must return an object with status and optional message field.
   124  The custom health check might return one of the following health statuses:
   125  
   126    * `Healthy` - the resource is healthy
   127    * `Progressing` - the resource is not healthy yet but still making progress and might be healthy soon
   128    * `Degraded` - the resource is degraded
   129    * `Suspended` - the resource is suspended and waiting for some external event to resume (e.g. suspended CronJob or paused Deployment)
   130  
   131  By default, health typically returns a `Progressing` status.
   132  
   133  NOTE: As a security measure, access to the standard Lua libraries will be disabled by default. Admins can control access by
   134  setting `resource.customizations.useOpenLibs.<group>_<kind>`. In the following example, standard libraries are enabled for health check of `cert-manager.io/Certificate`.
   135  
   136  ```yaml
   137  data:
   138    resource.customizations.useOpenLibs.cert-manager.io_Certificate: true
   139    resource.customizations.health.cert-manager.io_Certificate: |
   140      # Lua standard libraries are enabled for this script
   141  ```
   142  
   143  ### Way 2. Contribute a Custom Health Check
   144  
   145  A health check can be bundled into Argo CD. Custom health check scripts are located in the `resource_customizations` directory of [https://github.com/argoproj/argo-cd](https://github.com/argoproj/argo-cd). This must have the following directory structure:
   146  
   147  ```
   148  argo-cd
   149  |-- resource_customizations
   150  |    |-- your.crd.group.io               # CRD group
   151  |    |    |-- MyKind                     # Resource kind
   152  |    |    |    |-- health.lua            # Health check
   153  |    |    |    |-- health_test.yaml      # Test inputs and expected results
   154  |    |    |    +-- testdata              # Directory with test resource YAML definitions
   155  ```
   156  
   157  Each health check must have tests defined in `health_test.yaml` file. The `health_test.yaml` is a YAML file with the following structure:
   158  
   159  ```yaml
   160  tests:
   161  - healthStatus:
   162      status: ExpectedStatus
   163      message: Expected message
   164    inputPath: testdata/test-resource-definition.yaml
   165  ```
   166  
   167  To test the implemented custom health checks, run `go test -v ./util/lua/`.
   168  
   169  The [PR#1139](https://github.com/argoproj/argo-cd/pull/1139) is an example of Cert Manager CRDs custom health check.
   170  
   171  #### Wildcard Support for Built-in Health Checks
   172  
   173  You can use a single health check for multiple resources by using a wildcard in the group or kind directory names.
   174  
   175  The `_` character behaves like a `*` wildcard. For example, consider the following directory structure:
   176  
   177  ```
   178  argo-cd
   179  |-- resource_customizations
   180  |    |-- _.group.io               # CRD group
   181  |    |    |-- _                   # Resource kind
   182  |    |    |    |-- health.lua     # Health check
   183  ```
   184  
   185  Any resource with a group that ends with `.group.io` will use the health check in `health.lua`.
   186  
   187  Wildcard checks are only evaluated if there is no specific check for the resource.
   188  
   189  If multiple wildcard checks match, the first one in the directory structure is used.
   190  
   191  We use the [doublestar](https://github.com/bmatcuk/doublestar) glob library to match the wildcard checks. We currently
   192  only treat a path as a wildcard if it contains a `_` character, but this may change in the future.
   193  
   194  !!!important "Avoid Massive Scripts"
   195  
   196      Avoid writing massive scripts to handle multiple resources. They'll get hard to read and maintain. Instead, just
   197      duplicate the relevant parts in resource-specific scripts.
   198  
   199  ## Overriding Go-Based Health Checks
   200  
   201  Health checks for some resources were [hardcoded as Go code](https://github.com/argoproj/gitops-engine/tree/master/pkg/health) 
   202  because Lua support was introduced later. Also, the logic of health checks for some resources were too complex, so it 
   203  was easier to implement it in Go.
   204  
   205  It is possible to override health checks for built-in resource. Argo will prefer the configured health check over the
   206  Go-based built-in check.
   207  
   208  The following resources have Go-based health checks:
   209  
   210  * PersistentVolumeClaim
   211  * Pod
   212  * Service
   213  * apiregistration.k8s.io/APIService
   214  * apps/DaemonSet
   215  * apps/Deployment
   216  * apps/ReplicaSet
   217  * apps/StatefulSet
   218  * argoproj.io/Workflow
   219  * autoscaling/HorizontalPodAutoscaler
   220  * batch/Job
   221  * extensions/Ingress
   222  * networking.k8s.io/Ingress
   223  
   224  ## Health Checks
   225  
   226  An Argo CD App's health is inferred from the health of its immediate child resources (the resources represented in 
   227  source control). The App health will be the worst health of its immediate child sources. The priority of most to least 
   228  healthy statuses is: `Healthy`, `Suspended`, `Progressing`, `Missing`, `Degraded`, `Unknown`. So, for example, if an App
   229  has a `Missing` resource and a `Degraded` resource, the App's health will be `Missing`.
   230  
   231  But the health of a resource is not inherited from child resources - it is calculated using only information about the 
   232  resource itself. A resource's status field may or may not contain information about the health of a child resource, and 
   233  the resource's health check may or may not take that information into account.
   234  
   235  The lack of inheritance is by design. A resource's health can't be inferred from its children because the health of a
   236  child resource may not be relevant to the health of the parent resource. For example, a Deployment's health is not
   237  necessarily affected by the health of its Pods. 
   238  
   239  ```
   240  App (healthy)
   241  └── Deployment (healthy)
   242      └── ReplicaSet (healthy)
   243          └── Pod (healthy)
   244      └── ReplicaSet (unhealthy)
   245          └── Pod (unhealthy)
   246  ```
   247  
   248  If you want the health of a child resource to affect the health of its parent, you need to configure the parent's health
   249  check to take the child's health into account. Since only the parent resource's state is available to the health check,
   250  the parent resource's controller needs to make the child resource's health available in the parent resource's status 
   251  field.
   252  
   253  ```
   254  App (healthy)
   255  └── CustomResource (healthy) <- This resource's health check needs to be fixed to mark the App as unhealthy
   256      └── CustomChildResource (unhealthy)
   257  ```
   258  ## Ignoring Child Resource Health Check in Applications
   259  
   260  To ignore the health check of an immediate child resource within an Application, set the annotation `argocd.argoproj.io/ignore-healthcheck` to `true`. For example:
   261  
   262  ```yaml
   263  apiVersion: apps/v1
   264  kind: Deployment
   265  metadata:
   266    annotations:
   267      argocd.argoproj.io/ignore-healthcheck: "true"
   268  ```
   269  
   270  By doing this, the health status of the Deployment will not affect the health of its parent Application.