github.com/argoproj/argo-cd/v2@v2.10.9/docs/operator-manual/health.md (about)

     1  # Resource Health
     2  
     3  ## Overview
     4  Argo CD provides built-in health assessment for several standard Kubernetes types, which is then
     5  surfaced to the overall Application health status as a whole. The following checks are made for
     6  specific types of Kubernetes resources:
     7  
     8  ### Deployment, ReplicaSet, StatefulSet, DaemonSet
     9  * Observed generation is equal to desired generation.
    10  * Number of **updated** replicas equals the number of desired replicas.
    11  
    12  ### Service
    13  * If service type is of type `LoadBalancer`, the `status.loadBalancer.ingress` list is non-empty,
    14  with at least one value for `hostname` or `IP`.
    15  
    16  ### Ingress
    17  * The `status.loadBalancer.ingress` list is non-empty, with at least one value for `hostname` or `IP`.
    18  
    19  ### Job
    20  * If job `.spec.suspended` is set to 'true', then the job and app health will be marked as suspended.
    21  ### PersistentVolumeClaim
    22  * The `status.phase` is `Bound`
    23  
    24  ### Argocd App
    25  
    26  The health assessment of `argoproj.io/Application` CRD has been removed in argocd 1.8 (see [#3781](https://github.com/argoproj/argo-cd/issues/3781) for more information).
    27  You might need to restore it if you are using app-of-apps pattern and orchestrating synchronization using sync waves. Add the following resource customization in
    28  `argocd-cm` ConfigMap:
    29  
    30  ```yaml
    31  ---
    32  apiVersion: v1
    33  kind: ConfigMap
    34  metadata:
    35    name: argocd-cm
    36    namespace: argocd
    37    labels:
    38      app.kubernetes.io/name: argocd-cm
    39      app.kubernetes.io/part-of: argocd
    40  data:
    41    resource.customizations: |
    42      argoproj.io/Application:
    43        health.lua: |
    44          hs = {}
    45          hs.status = "Progressing"
    46          hs.message = ""
    47          if obj.status ~= nil then
    48            if obj.status.health ~= nil then
    49              hs.status = obj.status.health.status
    50              if obj.status.health.message ~= nil then
    51                hs.message = obj.status.health.message
    52              end
    53            end
    54          end
    55          return hs
    56  ```
    57  
    58  ## Custom Health Checks
    59  
    60  Argo CD supports custom health checks written in [Lua](https://www.lua.org/). This is useful if you:
    61  
    62  * Are affected by known issues where your `Ingress` or `StatefulSet` resources are stuck in `Progressing` state because of bug in your resource controller.
    63  * Have a custom resource for which Argo CD does not have a built-in health check.
    64  
    65  There are two ways to configure a custom health check. The next two sections describe those ways.
    66  
    67  ### Way 1. Define a Custom Health Check in `argocd-cm` ConfigMap
    68  
    69  Custom health checks can be defined in
    70  ```yaml
    71    resource.customizations: |
    72      <group/kind>:
    73        health.lua: |
    74  ```
    75  field of `argocd-cm`. If you are using argocd-operator, this is overridden by [the argocd-operator resourceCustomizations](https://argocd-operator.readthedocs.io/en/latest/reference/argocd/#resource-customizations).
    76  
    77  The following example demonstrates a health check for `cert-manager.io/Certificate`.
    78  
    79  ```yaml
    80  data:
    81    resource.customizations: |
    82      cert-manager.io/Certificate:
    83        health.lua: |
    84          hs = {}
    85          if obj.status ~= nil then
    86            if obj.status.conditions ~= nil then
    87              for i, condition in ipairs(obj.status.conditions) do
    88                if condition.type == "Ready" and condition.status == "False" then
    89                  hs.status = "Degraded"
    90                  hs.message = condition.message
    91                  return hs
    92                end
    93                if condition.type == "Ready" and condition.status == "True" then
    94                  hs.status = "Healthy"
    95                  hs.message = condition.message
    96                  return hs
    97                end
    98              end
    99            end
   100          end
   101  
   102          hs.status = "Progressing"
   103          hs.message = "Waiting for certificate"
   104          return hs
   105  ```
   106  In order to prevent duplication of the custom health check for potentially multiple resources, it is also possible to specify a wildcard in the resource kind, and anywhere in the resource group, like this:
   107  
   108  ```yaml
   109    resource.customizations: |
   110      ec2.aws.crossplane.io/*:
   111        health.lua: |
   112          ...
   113  ```
   114  
   115  ```yaml
   116    resource.customizations: |
   117      "*.aws.crossplane.io/*":
   118        health.lua: | 
   119          ...
   120  ```
   121  
   122  !!!important
   123      Please note the required quotes in the resource customization health section, if the wildcard starts with `*`.
   124  
   125  The `obj` is a global variable which contains the resource. The script must return an object with status and optional message field.
   126  The custom health check might return one of the following health statuses:
   127  
   128    * `Healthy` - the resource is healthy
   129    * `Progressing` - the resource is not healthy yet but still making progress and might be healthy soon
   130    * `Degraded` - the resource is degraded
   131    * `Suspended` - the resource is suspended and waiting for some external event to resume (e.g. suspended CronJob or paused Deployment)
   132  
   133  By default health typically returns `Progressing` status.
   134  
   135  NOTE: As a security measure, access to the standard Lua libraries will be disabled by default. Admins can control access by
   136  setting `resource.customizations.useOpenLibs.<group_kind>`. In the following example, standard libraries are enabled for health check of `cert-manager.io/Certificate`.
   137  
   138  ```yaml
   139  data:
   140    resource.customizations: |
   141      cert-manager.io/Certificate:
   142        health.lua.useOpenLibs: true
   143        health.lua: |
   144          # Lua standard libraries are enabled for this script
   145  ```
   146  
   147  ### Way 2. Contribute a Custom Health Check
   148  
   149  A health check can be bundled into Argo CD. Custom health check scripts are located in the `resource_customizations` directory of [https://github.com/argoproj/argo-cd](https://github.com/argoproj/argo-cd). This must have the following directory structure:
   150  
   151  ```
   152  argo-cd
   153  |-- resource_customizations
   154  |    |-- your.crd.group.io               # CRD group
   155  |    |    |-- MyKind                     # Resource kind
   156  |    |    |    |-- health.lua            # Health check
   157  |    |    |    |-- health_test.yaml      # Test inputs and expected results
   158  |    |    |    +-- testdata              # Directory with test resource YAML definitions
   159  ```
   160  
   161  Each health check must have tests defined in `health_test.yaml` file. The `health_test.yaml` is a YAML file with the following structure:
   162  
   163  ```yaml
   164  tests:
   165  - healthStatus:
   166      status: ExpectedStatus
   167      message: Expected message
   168    inputPath: testdata/test-resource-definition.yaml
   169  ```
   170  
   171  To test the implemented custom health checks, run `go test -v ./util/lua/`.
   172  
   173  The [PR#1139](https://github.com/argoproj/argo-cd/pull/1139) is an example of Cert Manager CRDs custom health check.
   174  
   175  Please note that bundled health checks with wildcards are not supported.
   176  
   177  ## Health Checks
   178  
   179  An Argo CD App's health is inferred from the health of its immediate child resources (the resources represented in 
   180  source control). 
   181  
   182  But the health of a resource is not inherited from child resources - it is calculated using only information about the 
   183  resource itself. A resource's status field may or may not contain information about the health of a child resource, and 
   184  the resource's health check may or may not take that information into account.
   185  
   186  The lack of inheritance is by design. A resource's health can't be inferred from its children because the health of a
   187  child resource may not be relevant to the health of the parent resource. For example, a Deployment's health is not
   188  necessarily affected by the health of its Pods. 
   189  
   190  ```
   191  App (healthy)
   192  └── Deployment (healthy)
   193      └── ReplicaSet (healthy)
   194          └── Pod (healthy)
   195      └── ReplicaSet (unhealthy)
   196          └── Pod (unhealthy)
   197  ```
   198  
   199  If you want the health of a child resource to affect the health of its parent, you need to configure the parent's health
   200  check to take the child's health into account. Since only the parent resource's state is available to the health check,
   201  the parent resource's controller needs to make the child resource's health available in the parent resource's status 
   202  field.
   203  
   204  ```
   205  App (healthy)
   206  └── CustomResource (healthy) <- This resource's health check needs to be fixed to mark the App as unhealthy
   207      └── CustomChildResource (unhealthy)
   208  ```