sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/reference/metrics.md (about)

     1  ---
     2  title: "Prometheus Metrics"
     3  linkTitle: "Prometheus Metrics"
     4  date: 2022-02-14
     5  description: >
     6    Prometheus metrics exported by Kueue
     7  ---
     8  
     9  Kueue exposes [prometheus](https://prometheus.io) metrics to monitor the health
    10  of the system and the status of [ClusterQueues](/docs/concepts/cluster_queue).
    11  
    12  ## Kueue health
    13  
    14  Use the following metrics to monitor the health of the kueue controllers:
    15  
    16  | Metric name | Type | Description | Labels |
    17  | ----------- | ---- | ----------- | ------ |
    18  | `kueue_admission_attempts_total` | Counter | The total number of attempts to [admit](/docs/concepts#admission) workloads. Each admission attempt might try to admit more than one workload. | `result`: possible values are `success` or `inadmissible` |
    19  | `kueue_admission_attempt_duration_seconds` | Histogram | The latency of an admission attempt. | `result`: possible values are `success` or `inadmissible` |
    20  
    21  ## ClusterQueue status
    22  
    23  Use the following metrics to monitor the status of your ClusterQueues:
    24  
    25  | Metric name | Type | Description | Labels |
    26  | ----------- | ---- | ----------- | ------ |
    27  | `kueue_pending_workloads` | Gauge | The number of pending workloads. | `cluster_queue`: the name of the ClusterQueue<br> `status`: possible values are `active` or `inadmissible` |
    28  | `kueue_admitted_workloads_total` | Counter | The total number of admitted workloads. | `cluster_queue`: the name of the ClusterQueue |
    29  | `kueue_admission_wait_time_seconds` | Histogram | The time between a Workload was created until it was admitted. | `cluster_queue`: the name of the ClusterQueue |
    30  | `kueue_admitted_active_workloads` | Gauge | The number of admitted Workloads that are active (unsuspended and not finished) | `cluster_queue`: the name of the ClusterQueue |
    31  | `kueue_cluster_queue_status` | Gauge | Reports the status of the ClusterQueue | `cluster_queue`: The name of the ClusterQueue<br> `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses. |
    32  
    33  ### Optional metrics
    34  
    35  The following metrics are available only if `metrics.enableClusterQueueResources` is enabled in the [manager's configuration](/docs/installation/#install-a-custom-configured-released-version).
    36  
    37  | Metric name | Type | Description | Labels |
    38  | ----------- | ---- | ----------- | ------ |
    39  | `kueue_cluster_queue_resource_usage` | Gauge | Reports the ClusterQueue's total resource usage |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
    40  | `kueue_cluster_queue_nominal_quota` | Gauge | Reports the ClusterQueue's resource quota |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
    41  | `kueue_cluster_queue_borrowing_limit` | Gauge | Reports the ClusterQueue's resource borrowing limit |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|