github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/autoscaling/plugins/apm.mdx

github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/autoscaling/plugins/apm.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: APM
     4  sidebar_title: APM
     5  description: APM plugins provide metric data points describing the resources current state.
     6  ---
     7  
     8  # APM Plugins
     9  
    10  APMs are used to store metrics about an applications performance and current
    11  state. The APM (Application Performance Management) plugin is responsible for
    12  querying the APM and returning a value which will be used to determine if
    13  scaling should occur.
    14  
    15  ## Prometheus APM Plugin
    16  
    17  Use [Prometheus][prometheus_io] metrics to scale your Nomad job task groups or
    18  cluster.
    19  
    20  ### Agent Configuration Options
    21  
    22  ```hcl
    23  apm "prometheus" {
    24    driver = "prometheus"
    25  
    26    config = {
    27      address = "http://prometheus.my.endpoint.io:9090"
    28    }
    29  }
    30  ```
    31  
    32  - `address` `(string: "http://127.0.0.1:9090")` - The address of the Prometheus
    33    endpoint used to perform queries.
    34  
    35  ### Policy Configuration Options
    36  
    37  ```hcl
    38  check {
    39    source = "prometheus"
    40    query  = "avg((haproxy_server_current_sessions{backend=\"http_back\"}) and (haproxy_server_up{backend=\"http_back\"} == 1))"
    41    ...
    42  }
    43  ```
    44  
    45  ## Datadog APM Plugin
    46  
    47  The [Datadog][datadog_homepage] APM allows using [time series][datadog_timeseries]
    48  data to make scaling decisions.
    49  
    50  ### Agent Configuration Options
    51  
    52  ```hcl
    53  apm "datadog" {
    54    driver = "datadog"
    55  
    56    config = {
    57      dd_api_key = "<api key>"
    58      dd_app_key = "<app key>"
    59    }
    60  }
    61  ```
    62  
    63  - `dd_api_key` `(string: "")` - The Datadog API key to use for authentication.
    64  - `dd_app_key` `(string: "")` - The Datadog APP key to use for authentication.
    65  
    66  The Datadog plugin can also read its configuration options via environment
    67  variables. The accepted keys are `DD_API_KEY` and `DD_APP_KEY`. The agent
    68  configuration parameters take precedence over the environment variables.
    69  
    70  ### Policy Configuration Options
    71  
    72  ```hcl
    73  check {
    74    source = "datadog"
    75    query  = "avg:proxy.backend.response.time{proxy-service:web-app}"
    76    ...
    77  }
    78  ```
    79  
    80  ## Nomad APM Plugin
    81  
    82  The Nomad APM plugin allows querying the Nomad API for metric data. This provides
    83  an immediate starting point without addition applications but comes at the price
    84  of efficiency. When using this APM, it is advised to monitor Nomad carefully
    85  ensuring it is not put under excessive load pressure.
    86  
    87  ### Agent Configuration Options
    88  
    89  ```hcl
    90  apm "nomad-apm" {
    91    driver = "nomad-apm"
    92  }
    93  ```
    94  
    95  When using a Nomad cluster with ACLs enabled, following ACL policy will provide the appropriate
    96  permissions for obtaining task group metrics:
    97  
    98  ```hcl
    99  namespace "default" {
   100    policy       = "read"
   101    capabilities = ["read-job"]
   102  }
   103  ```
   104  
   105  In order to obtain cluster level metrics, the following ACL policy will be required:
   106  
   107  ```hcl
   108  node {
   109    policy = "read"
   110  }
   111  
   112  namespace "default" {
   113    policy       = "read"
   114    capabilities = ["read-job"]
   115  }
   116  ```
   117  
   118  ### Policy Configuration Options - Task Groups
   119  
   120  The Nomad APM allows querying Nomad to understand the current resource usage of
   121  a task group.
   122  
   123  ```hcl
   124  check {
   125    source = "nomad-apm"
   126    query  = "avg_cpu"
   127    ...
   128  }
   129  ```
   130  
   131  Querying Nomad task group metrics is be done using the `operation_metric` syntax,
   132  where valid operations are:
   133  
   134  - `avg` - returns the average of the metric value across allocations in the task
   135    group.
   136  
   137  - `min` - returns the lowest metric value among the allocations in the task group.
   138  
   139  - `max` - returns the highest metric value among the allocations in the task
   140    group.
   141  
   142  - `sum` - returns the sum of all the metric values for the allocations in the
   143    task group.
   144  
   145  The metric value can be:
   146  
   147  - `cpu` - CPU usage as reported by the `nomad.client.allocs.cpu.total_percent`
   148    metric.
   149  
   150  - `memory` - Memory usage as reported by the `nomad.client.allocs.memory.usage`
   151    metric.
   152  
   153  ### Policy Configuration Options - Client Nodes
   154  
   155  The Nomad APM allows querying Nomad to understand the current allocated resource
   156  as a percentage of the total available.
   157  
   158  ```hcl
   159  check {
   160    source = "nomad-apm"
   161    query  = "percentage-allocated_cpu"
   162    ...
   163  }
   164  ```
   165  
   166  Querying Nomad client node metrics is be done using the `operation_metric` syntax,
   167  where valid operations are:
   168  
   169  - `percentage-allocated` - returns the allocated percentage of the desired
   170    resource.
   171  
   172  The metric value can be:
   173  
   174  - `cpu` - allocated CPU as reported by calculating total allocatable against the
   175    total allocated by the scheduler.
   176  
   177  - `memory` - allocated memory as reported by calculating total allocatable against
   178    the total allocated by the scheduler.
   179  
   180  [prometheus_io]: https://prometheus.io/
   181  [prometheus_scaler_function]: https://prometheus.io/docs/prometheus/latest/querying/functions/#scalar
   182  [nomad_telemetry_stanza]: /docs/configuration/telemetry#inlinecode-publish_allocation_metrics
   183  [datadog_homepage]: https://www.datadoghq.com/
   184  [datadog_timeseries]: https://docs.datadoghq.com/api/v1/metrics/#query-timeseries-points