github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/telemetry/metrics.mdx

github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/telemetry/metrics.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Metrics
     4  sidebar_title: Metrics
     5  description: Learn about the different metrics available in Nomad.
     6  ---
     7  
     8  # Metrics
     9  
    10  The Nomad agent collects various runtime metrics about the performance of
    11  different libraries and subsystems. These metrics are aggregated on a ten
    12  second interval and are retained for one minute.
    13  
    14  This data can be accessed via an HTTP endpoint or via sending a signal to the
    15  Nomad process.
    16  
    17  As of Nomad version 0.7, this data is available via HTTP at `/metrics`. See
    18  [Metrics](/api-docs/metrics) for more information.
    19  
    20  To view this data via sending a signal to the Nomad process: on Unix,
    21  this is `USR1` while on Windows it is `BREAK`. Once Nomad receives the signal,
    22  it will dump the current telemetry information to the agent's `stderr`.
    23  
    24  This telemetry information can be used for debugging or otherwise
    25  getting a better view of what Nomad is doing.
    26  
    27  Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite)
    28  as well as statsd based on providing the appropriate configuration options.
    29  
    30  To configure the telemetry output please see the [agent
    31  configuration](/docs/configuration/telemetry).
    32  
    33  Below is sample output of a telemetry dump:
    34  
    35  ```text
    36  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
    37  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
    38  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
    39  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
    40  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
    41  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
    42  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
    43  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
    44  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
    45  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
    46  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
    47  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
    48  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
    49  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
    50  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
    51  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
    52  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
    53  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
    54  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
    55  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
    56  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
    57  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
    58  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
    59  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
    60  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
    61  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
    62  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
    63  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
    64  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
    65  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
    66  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
    67  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
    68  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
    69  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
    70  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
    71  ```
    72  
    73  ## Key Metrics
    74  
    75  When telemetry is being streamed to statsite or statsd, `interval` is defined to
    76  be their flush interval. Otherwise, the interval can be assumed to be 10 seconds
    77  when retrieving metrics using the above described signals.
    78  
    79  <table>
    80    <thead>
    81      <tr>
    82        <th>Metric</th>
    83        <th>Description</th>
    84        <th>Unit</th>
    85        <th>Type</th>
    86      </tr>
    87    </thead>
    88    <tbody>
    89      <tr>
    90        <td>
    91          <code>nomad.runtime.num_goroutines</code>
    92        </td>
    93        <td>Number of goroutines and general load pressure indicator</td>
    94        <td># of goroutines</td>
    95        <td>Gauge</td>
    96      </tr>
    97      <tr>
    98        <td>
    99          <code>nomad.runtime.alloc_bytes</code>
   100        </td>
   101        <td>Memory utilization</td>
   102        <td># of bytes</td>
   103        <td>Gauge</td>
   104      </tr>
   105      <tr>
   106        <td>
   107          <code>nomad.runtime.heap_objects</code>
   108        </td>
   109        <td>Number of objects on the heap. General memory pressure indicator</td>
   110        <td># of heap objects</td>
   111        <td>Gauge</td>
   112      </tr>
   113      <tr>
   114        <td>
   115          <code>nomad.raft.apply</code>
   116        </td>
   117        <td>Number of Raft transactions</td>
   118        <td>Raft transactions / `interval`</td>
   119        <td>Counter</td>
   120      </tr>
   121      <tr>
   122        <td>
   123          <code>nomad.raft.replication.appendEntries</code>
   124        </td>
   125        <td>Raft transaction commit time</td>
   126        <td>ms / Raft Log Append</td>
   127        <td>Timer</td>
   128      </tr>
   129      <tr>
   130        <td>
   131          <code>nomad.raft.leader.lastContact</code>
   132        </td>
   133        <td>
   134          Time since last contact to leader. General indicator of Raft latency
   135        </td>
   136        <td>ms / Leader Contact</td>
   137        <td>Timer</td>
   138      </tr>
   139      <tr>
   140        <td>
   141          <code>nomad.broker.total_ready</code>
   142        </td>
   143        <td>Number of evaluations ready to be processed</td>
   144        <td># of evaluations</td>
   145        <td>Gauge</td>
   146      </tr>
   147      <tr>
   148        <td>
   149          <code>nomad.broker.total_unacked</code>
   150        </td>
   151        <td>Evaluations dispatched for processing but incomplete</td>
   152        <td># of evaluations</td>
   153        <td>Gauge</td>
   154      </tr>
   155      <tr>
   156        <td>
   157          <code>nomad.broker.total_blocked</code>
   158        </td>
   159        <td>
   160          Evaluations that are blocked until an existing evaluation for the same
   161          job completes
   162        </td>
   163        <td># of evaluations</td>
   164        <td>Gauge</td>
   165      </tr>
   166      <tr>
   167        <td>
   168          <code>nomad.plan.queue_depth</code>
   169        </td>
   170        <td>Number of scheduler Plans waiting to be evaluated</td>
   171        <td># of plans</td>
   172        <td>Gauge</td>
   173      </tr>
   174      <tr>
   175        <td>
   176          <code>nomad.plan.submit</code>
   177        </td>
   178        <td>
   179          Time to submit a scheduler Plan. Higher values cause lower scheduling
   180          throughput
   181        </td>
   182        <td>ms / Plan Submit</td>
   183        <td>Timer</td>
   184      </tr>
   185      <tr>
   186        <td>
   187          <code>nomad.plan.evaluate</code>
   188        </td>
   189        <td>
   190          Time to validate a scheduler Plan. Higher values cause lower scheduling
   191          throughput. Similar to <code>nomad.plan.submit</code> but does not
   192          include RPC time or time in the Plan Queue
   193        </td>
   194        <td>ms / Plan Evaluation</td>
   195        <td>Timer</td>
   196      </tr>
   197      <tr>
   198        <td>
   199          <code>nomad.worker.invoke_scheduler.&lt;type&gt;</code>
   200        </td>
   201        <td>Time to run the scheduler of the given type</td>
   202        <td>ms / Scheduler Run</td>
   203        <td>Timer</td>
   204      </tr>
   205      <tr>
   206        <td>
   207          <code>nomad.worker.wait_for_index</code>
   208        </td>
   209        <td>
   210          Time waiting for Raft log replication from leader. High delays result in
   211          lower scheduling throughput
   212        </td>
   213        <td>ms / Raft Index Wait</td>
   214        <td>Timer</td>
   215      </tr>
   216      <tr>
   217        <td>
   218          <code>nomad.heartbeat.active</code>
   219        </td>
   220        <td>
   221          Number of active heartbeat timers. Each timer represents a Nomad Client
   222          connection
   223        </td>
   224        <td># of heartbeat timers</td>
   225        <td>Gauge</td>
   226      </tr>
   227      <tr>
   228        <td>
   229          <code>nomad.heartbeat.invalidate</code>
   230        </td>
   231        <td>
   232          The length of time it takes to invalidate a Nomad Client due to failed
   233          heartbeats
   234        </td>
   235        <td>ms / Heartbeat Invalidation</td>
   236        <td>Timer</td>
   237      </tr>
   238      <tr>
   239        <td>
   240          <code>nomad.rpc.query</code>
   241        </td>
   242        <td>Number of RPC queries</td>
   243        <td>RPC Queries / `interval`</td>
   244        <td>Counter</td>
   245      </tr>
   246      <tr>
   247        <td>
   248          <code>nomad.rpc.request</code>
   249        </td>
   250        <td>Number of RPC requests being handled</td>
   251        <td>RPC Requests / `interval`</td>
   252        <td>Counter</td>
   253      </tr>
   254      <tr>
   255        <td>
   256          <code>nomad.rpc.request_error</code>
   257        </td>
   258        <td>Number of RPC requests being handled that result in an error</td>
   259        <td>RPC Errors / `interval`</td>
   260        <td>Counter</td>
   261      </tr>
   262    </tbody>
   263  </table>
   264  
   265  ## Client Metrics
   266  
   267  The Nomad client emits metrics related to the resource usage of the allocations
   268  and tasks running on it and the node itself. Operators have to explicitly turn
   269  on publishing host and allocation metrics. Publishing allocation and host
   270  metrics can be turned on by setting the value of `publish_allocation_metrics`
   271  `publish_node_metrics` to `true`.
   272  
   273  By default the collection interval is 1 second but it can be changed by the
   274  changing the value of the `collection_interval` key in the `telemetry`
   275  configuration block.
   276  
   277  Please see the [agent configuration](/docs/configuration/telemetry)
   278  page for more details.
   279  
   280  As of Nomad 0.9, Nomad will emit additional labels for [parameterized](/docs/job-specification/parameterized) and
   281  [periodic](/docs/job-specification/parameterized) jobs. Nomad
   282  emits the parent job id as a new label `parent_id`. Also, the labels `dispatch_id`
   283  and `periodic_id` are emitted, containing the ID of the specific invocation of the
   284  parameterized or periodic job respectively. For example, a dispatch job with the id
   285  `myjob/dispatch-1312323423423`, will have the following labels.
   286  
   287  <table>
   288    <thead>
   289      <tr>
   290        <th>Label</th>
   291        <th>Value</th>
   292      </tr>
   293    </thead>
   294    <tbody>
   295      <tr>
   296        <td>job</td>
   297        <td>
   298          <code>myjob/dispatch-1312323423423</code>
   299        </td>
   300      </tr>
   301      <tr>
   302        <td>parent_id</td>
   303        <td>myjob</td>
   304      </tr>
   305      <tr>
   306        <td>dispatch_id</td>
   307        <td>1312323423423</td>
   308      </tr>
   309    </tbody>
   310  </table>
   311  
   312  ## Host Metrics (post Nomad version 0.7)
   313  
   314  Starting in version 0.7, Nomad will emit [tagged metrics][tagged-metrics], in the below format:
   315  
   316  <table>
   317    <thead>
   318      <tr>
   319        <th>Metric</th>
   320        <th>Description</th>
   321        <th>Unit</th>
   322        <th>Type</th>
   323        <th>Labels</th>
   324      </tr>
   325    </thead>
   326    <tbody>
   327      <tr>
   328        <td>
   329          <code>nomad.client.allocated.cpu</code>
   330        </td>
   331        <td>Total amount of CPU shares the scheduler has allocated to tasks</td>
   332        <td>MHz</td>
   333        <td>Gauge</td>
   334        <td>node_id, datacenter</td>
   335      </tr>
   336      <tr>
   337        <td>
   338          <code>nomad.client.unallocated.cpu</code>
   339        </td>
   340        <td>
   341          Total amount of CPU shares free for the scheduler to allocate to tasks
   342        </td>
   343        <td>MHz</td>
   344        <td>Gauge</td>
   345        <td>node_id, datacenter</td>
   346      </tr>
   347      <tr>
   348        <td>
   349          <code>nomad.client.allocated.memory</code>
   350        </td>
   351        <td>Total amount of memory the scheduler has allocated to tasks</td>
   352        <td>Megabytes</td>
   353        <td>Gauge</td>
   354        <td>node_id, datacenter</td>
   355      </tr>
   356      <tr>
   357        <td>
   358          <code>nomad.client.unallocated.memory</code>
   359        </td>
   360        <td>
   361          Total amount of memory free for the scheduler to allocate to tasks
   362        </td>
   363        <td>Megabytes</td>
   364        <td>Gauge</td>
   365        <td>node_id, datacenter</td>
   366      </tr>
   367      <tr>
   368        <td>
   369          <code>nomad.client.allocated.disk</code>
   370        </td>
   371        <td>Total amount of disk space the scheduler has allocated to tasks</td>
   372        <td>Megabytes</td>
   373        <td>Gauge</td>
   374        <td>node_id, datacenter</td>
   375      </tr>
   376      <tr>
   377        <td>
   378          <code>nomad.client.unallocated.disk</code>
   379        </td>
   380        <td>
   381          Total amount of disk space free for the scheduler to allocate to tasks
   382        </td>
   383        <td>Megabytes</td>
   384        <td>Gauge</td>
   385        <td>node_id, datacenter</td>
   386      </tr>
   387      <tr>
   388        <td>
   389          <code>nomad.client.allocated.network</code>
   390        </td>
   391        <td>
   392          Total amount of bandwidth the scheduler has allocated to tasks on the
   393          given device
   394        </td>
   395        <td>Megabits</td>
   396        <td>Gauge</td>
   397        <td>node_id, datacenter, device</td>
   398      </tr>
   399      <tr>
   400        <td>
   401          <code>nomad.client.unallocated.network</code>
   402        </td>
   403        <td>
   404          Total amount of bandwidth free for the scheduler to allocate to tasks on
   405          the given device
   406        </td>
   407        <td>Megabits</td>
   408        <td>Gauge</td>
   409        <td>node_id, datacenter, device</td>
   410      </tr>
   411      <tr>
   412        <td>
   413          <code>nomad.client.host.memory.total</code>
   414        </td>
   415        <td>Total amount of physical memory on the node</td>
   416        <td>Bytes</td>
   417        <td>Gauge</td>
   418        <td>node_id, datacenter</td>
   419      </tr>
   420      <tr>
   421        <td>
   422          <code>nomad.client.host.memory.available</code>
   423        </td>
   424        <td>
   425          Total amount of memory available to processes which includes free and
   426          cached memory
   427        </td>
   428        <td>Bytes</td>
   429        <td>Gauge</td>
   430        <td>node_id, datacenter</td>
   431      </tr>
   432      <tr>
   433        <td>
   434          <code>nomad.client.host.memory.used</code>
   435        </td>
   436        <td>Amount of memory used by processes</td>
   437        <td>Bytes</td>
   438        <td>Gauge</td>
   439        <td>node_id, datacenter</td>
   440      </tr>
   441      <tr>
   442        <td>
   443          <code>nomad.client.host.memory.free</code>
   444        </td>
   445        <td>Amount of memory which is free</td>
   446        <td>Bytes</td>
   447        <td>Gauge</td>
   448        <td>node_id, datacenter</td>
   449      </tr>
   450      <tr>
   451        <td>
   452          <code>nomad.client.uptime</code>
   453        </td>
   454        <td>Uptime of the host running the Nomad client</td>
   455        <td>Seconds</td>
   456        <td>Gauge</td>
   457        <td>node_id, datacenter</td>
   458      </tr>
   459      <tr>
   460        <td>
   461          <code>nomad.client.host.cpu.total</code>
   462        </td>
   463        <td>Total CPU utilization</td>
   464        <td>Percentage</td>
   465        <td>Gauge</td>
   466        <td>node_id, datacenter, cpu</td>
   467      </tr>
   468      <tr>
   469        <td>
   470          <code>nomad.client.host.cpu.user</code>
   471        </td>
   472        <td>CPU utilization in the user space</td>
   473        <td>Percentage</td>
   474        <td>Gauge</td>
   475        <td>node_id, datacenter, cpu</td>
   476      </tr>
   477      <tr>
   478        <td>
   479          <code>nomad.client.host.cpu.system</code>
   480        </td>
   481        <td>CPU utilization in the system space</td>
   482        <td>Percentage</td>
   483        <td>Gauge</td>
   484        <td>node_id, datacenter, cpu</td>
   485      </tr>
   486      <tr>
   487        <td>
   488          <code>nomad.client.host.cpu.idle</code>
   489        </td>
   490        <td>Idle time spent by the CPU</td>
   491        <td>Percentage</td>
   492        <td>Gauge</td>
   493        <td>node_id, datacenter, cpu</td>
   494      </tr>
   495      <tr>
   496        <td>
   497          <code>nomad.client.host.disk.size</code>
   498        </td>
   499        <td>Total size of the device</td>
   500        <td>Bytes</td>
   501        <td>Gauge</td>
   502        <td>node_id, datacenter, disk</td>
   503      </tr>
   504      <tr>
   505        <td>
   506          <code>nomad.client.host.disk.used</code>
   507        </td>
   508        <td>Amount of space which has been used</td>
   509        <td>Bytes</td>
   510        <td>Gauge</td>
   511        <td>node_id, datacenter, disk</td>
   512      </tr>
   513      <tr>
   514        <td>
   515          <code>nomad.client.host.disk.available</code>
   516        </td>
   517        <td>Amount of space which is available</td>
   518        <td>Bytes</td>
   519        <td>Gauge</td>
   520        <td>node_id, datacenter, disk</td>
   521      </tr>
   522      <tr>
   523        <td>
   524          <code>nomad.client.host.disk.used_percent</code>
   525        </td>
   526        <td>Percentage of disk space used</td>
   527        <td>Percentage</td>
   528        <td>Gauge</td>
   529        <td>node_id, datacenter, disk</td>
   530      </tr>
   531      <tr>
   532        <td>
   533          <code>nomad.client.host.disk.inodes_percent</code>
   534        </td>
   535        <td>Disk space consumed by the inodes</td>
   536        <td>Percent</td>
   537        <td>Gauge</td>
   538        <td>node_id, datacenter, disk</td>
   539      </tr>
   540      <tr>
   541        <td>
   542          <code>nomad.client.allocs.start</code>
   543        </td>
   544        <td>Number of allocations starting</td>
   545        <td>Integer</td>
   546        <td>Counter</td>
   547        <td>node_id, job, task_group</td>
   548      </tr>
   549      <tr>
   550        <td>
   551          <code>nomad.client.allocs.running</code>
   552        </td>
   553        <td>Number of allocations starting to run</td>
   554        <td>Integer</td>
   555        <td>Counter</td>
   556        <td>node_id, job, task_group</td>
   557      </tr>
   558      <tr>
   559        <td>
   560          <code>nomad.client.allocs.failed</code>
   561        </td>
   562        <td>Number of allocations failing</td>
   563        <td>Integer</td>
   564        <td>Counter</td>
   565        <td>node_id, job, task_group</td>
   566      </tr>
   567      <tr>
   568        <td>
   569          <code>nomad.client.allocs.restart</code>
   570        </td>
   571        <td>Number of allocations restarting</td>
   572        <td>Integer</td>
   573        <td>Counter</td>
   574        <td>node_id, job, task_group</td>
   575      </tr>
   576      <tr>
   577        <td>
   578          <code>nomad.client.allocs.complete</code>
   579        </td>
   580        <td>Number of allocations completing</td>
   581        <td>Integer</td>
   582        <td>Counter</td>
   583        <td>node_id, job, task_group</td>
   584      </tr>
   585      <tr>
   586        <td>
   587          <code>nomad.client.allocs.destroy</code>
   588        </td>
   589        <td>Number of allocations being destroyed</td>
   590        <td>Integer</td>
   591        <td>Counter</td>
   592        <td>node_id, job, task_group</td>
   593      </tr>
   594    </tbody>
   595  </table>
   596  
   597  Nomad 0.9 adds an additional `node_class` label from the client's
   598  `NodeClass` attribute. This label is set to the string "none" if empty.
   599  
   600  ## Host Metrics (deprecated post Nomad 0.7)
   601  
   602  The below are metrics emitted by Nomad in versions prior to 0.7. These metrics
   603  can be emitted in the below format post-0.7 (as well as the new format,
   604  detailed above) but any new metrics will only be available in the new format.
   605  
   606  <table>
   607    <thead>
   608      <tr>
   609        <th>Metric</th>
   610        <th>Description</th>
   611        <th>Unit</th>
   612        <th>Type</th>
   613      </tr>
   614    </thead>
   615    <tbody>
   616      <tr>
   617        <td>
   618          <code>nomad.client.allocated.cpu.&lt;HostID&gt;</code>
   619        </td>
   620        <td>Total amount of CPU shares the scheduler has allocated to tasks</td>
   621        <td>MHz</td>
   622        <td>Gauge</td>
   623      </tr>
   624      <tr>
   625        <td>
   626          <code>nomad.client.unallocated.cpu.&lt;HostID&gt;</code>
   627        </td>
   628        <td>
   629          Total amount of CPU shares free for the scheduler to allocate to tasks
   630        </td>
   631        <td>MHz</td>
   632        <td>Gauge</td>
   633      </tr>
   634      <tr>
   635        <td>
   636          <code>nomad.client.allocated.memory.&lt;HostID&gt;</code>
   637        </td>
   638        <td>Total amount of memory the scheduler has allocated to tasks</td>
   639        <td>Megabytes</td>
   640        <td>Gauge</td>
   641      </tr>
   642      <tr>
   643        <td>
   644          <code>nomad.client.unallocated.memory.&lt;HostID&gt;</code>
   645        </td>
   646        <td>
   647          Total amount of memory free for the scheduler to allocate to tasks
   648        </td>
   649        <td>Megabytes</td>
   650        <td>Gauge</td>
   651      </tr>
   652      <tr>
   653        <td>
   654          <code>nomad.client.allocated.disk.&lt;HostID&gt;</code>
   655        </td>
   656        <td>Total amount of disk space the scheduler has allocated to tasks</td>
   657        <td>Megabytes</td>
   658        <td>Gauge</td>
   659      </tr>
   660      <tr>
   661        <td>
   662          <code>nomad.client.unallocated.disk.&lt;HostID&gt;</code>
   663        </td>
   664        <td>
   665          Total amount of disk space free for the scheduler to allocate to tasks
   666        </td>
   667        <td>Megabytes</td>
   668        <td>Gauge</td>
   669      </tr>
   670      <tr>
   671        <td>
   672          <code>
   673            nomad.client.allocated.network.&lt;Device-Name&gt;.&lt;HostID&gt;
   674          </code>
   675        </td>
   676        <td>
   677          Total amount of bandwidth the scheduler has allocated to tasks on the
   678          given device
   679        </td>
   680        <td>Megabits</td>
   681        <td>Gauge</td>
   682      </tr>
   683      <tr>
   684        <td>
   685          <code>
   686            nomad.client.unallocated.network.&lt;Device-Name&gt;.&lt;HostID&gt;
   687          </code>
   688        </td>
   689        <td>
   690          Total amount of bandwidth free for the scheduler to allocate to tasks on
   691          the given device
   692        </td>
   693        <td>Megabits</td>
   694        <td>Gauge</td>
   695      </tr>
   696      <tr>
   697        <td>
   698          <code>nomad.client.host.memory.&lt;HostID&gt;.total</code>
   699        </td>
   700        <td>Total amount of physical memory on the node</td>
   701        <td>Bytes</td>
   702        <td>Gauge</td>
   703      </tr>
   704      <tr>
   705        <td>
   706          <code>nomad.client.host.memory.&lt;HostID&gt;.available</code>
   707        </td>
   708        <td>
   709          Total amount of memory available to processes which includes free and
   710          cached memory
   711        </td>
   712        <td>Bytes</td>
   713        <td>Gauge</td>
   714      </tr>
   715      <tr>
   716        <td>
   717          <code>nomad.client.host.memory.&lt;HostID&gt;.used</code>
   718        </td>
   719        <td>Amount of memory used by processes</td>
   720        <td>Bytes</td>
   721        <td>Gauge</td>
   722      </tr>
   723      <tr>
   724        <td>
   725          <code>nomad.client.host.memory.&lt;HostID&gt;.free</code>
   726        </td>
   727        <td>Amount of memory which is free</td>
   728        <td>Bytes</td>
   729        <td>Gauge</td>
   730      </tr>
   731      <tr>
   732        <td>
   733          <code>nomad.client.uptime.&lt;HostID&gt;</code>
   734        </td>
   735        <td>Uptime of the host running the Nomad client</td>
   736        <td>Seconds</td>
   737        <td>Gauge</td>
   738      </tr>
   739      <tr>
   740        <td>
   741          <code>nomad.client.host.cpu.&lt;HostID&gt;.&lt;CPU-Core&gt;.total</code>
   742        </td>
   743        <td>Total CPU utilization</td>
   744        <td>Percentage</td>
   745        <td>Gauge</td>
   746      </tr>
   747      <tr>
   748        <td>
   749          <code>nomad.client.host.cpu.&lt;HostID&gt;.&lt;CPU-Core&gt;.user</code>
   750        </td>
   751        <td>CPU utilization in the user space</td>
   752        <td>Percentage</td>
   753        <td>Gauge</td>
   754      </tr>
   755      <tr>
   756        <td>
   757          <code>
   758            nomad.client.host.cpu.&lt;HostID&gt;.&lt;CPU-Core&gt;.system
   759          </code>
   760        </td>
   761        <td>CPU utilization in the system space</td>
   762        <td>Percentage</td>
   763        <td>Gauge</td>
   764      </tr>
   765      <tr>
   766        <td>
   767          <code>nomad.client.host.cpu.&lt;HostID&gt;.&lt;CPU-Core&gt;.idle</code>
   768        </td>
   769        <td>Idle time spent by the CPU</td>
   770        <td>Percentage</td>
   771        <td>Gauge</td>
   772      </tr>
   773      <tr>
   774        <td>
   775          <code>
   776            nomad.client.host.disk.&lt;HostID&gt;.&lt;Device-Name&gt;.size
   777          </code>
   778        </td>
   779        <td>Total size of the device</td>
   780        <td>Bytes</td>
   781        <td>Gauge</td>
   782      </tr>
   783      <tr>
   784        <td>
   785          <code>
   786            nomad.client.host.disk.&lt;HostID&gt;.&lt;Device-Name&gt;.used
   787          </code>
   788        </td>
   789        <td>Amount of space which has been used</td>
   790        <td>Bytes</td>
   791        <td>Gauge</td>
   792      </tr>
   793      <tr>
   794        <td>
   795          <code>
   796            nomad.client.host.disk.&lt;HostID&gt;.&lt;Device-Name&gt;.available
   797          </code>
   798        </td>
   799        <td>Amount of space which is available</td>
   800        <td>Bytes</td>
   801        <td>Gauge</td>
   802      </tr>
   803      <tr>
   804        <td>
   805          <code>
   806            nomad.client.host.disk.&lt;HostID&gt;.&lt;Device-Name&gt;.used_percent
   807          </code>
   808        </td>
   809        <td>Percentage of disk space used</td>
   810        <td>Percentage</td>
   811        <td>Gauge</td>
   812      </tr>
   813      <tr>
   814        <td>
   815          <code>
   816            nomad.client.host.disk.&lt;HostID&gt;.&lt;Device-Name&gt;.inodes_percent
   817          </code>
   818        </td>
   819        <td>Disk space consumed by the inodes</td>
   820        <td>Percent</td>
   821        <td>Gauge</td>
   822      </tr>
   823    </tbody>
   824  </table>
   825  
   826  ## Allocation Metrics
   827  
   828  <table>
   829    <thead>
   830      <tr>
   831        <th>Metric</th>
   832        <th>Description</th>
   833        <th>Unit</th>
   834        <th>Type</th>
   835      </tr>
   836    </thead>
   837    <tbody>
   838      <tr>
   839        <td>
   840          <code>
   841            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.rss
   842          </code>
   843        </td>
   844        <td>Amount of RSS memory consumed by the task</td>
   845        <td>Bytes</td>
   846        <td>Gauge</td>
   847      </tr>
   848      <tr>
   849        <td>
   850          <code>
   851            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.cache
   852          </code>
   853        </td>
   854        <td>Amount of memory cached by the task</td>
   855        <td>Bytes</td>
   856        <td>Gauge</td>
   857      </tr>
   858      <tr>
   859        <td>
   860          <code>
   861            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.swap
   862          </code>
   863        </td>
   864        <td>Amount of memory swapped by the task</td>
   865        <td>Bytes</td>
   866        <td>Gauge</td>
   867      </tr>
   868      <tr>
   869        <td>
   870          <code>
   871            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.max_usage
   872          </code>
   873        </td>
   874        <td>Maximum amount of memory ever used by the task</td>
   875        <td>Bytes</td>
   876        <td>Gauge</td>
   877      </tr>
   878      <tr>
   879        <td>
   880          <code>
   881            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.kernel_usage
   882          </code>
   883        </td>
   884        <td>Amount of memory used by the kernel for this task</td>
   885        <td>Bytes</td>
   886        <td>Gauge</td>
   887      </tr>
   888      <tr>
   889        <td>
   890          <code>
   891            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.memory.kernel_max_usage
   892          </code>
   893        </td>
   894        <td>Maximum amount of memory ever used by the kernel for this task</td>
   895        <td>Bytes</td>
   896        <td>Gauge</td>
   897      </tr>
   898      <tr>
   899        <td>
   900          <code>
   901            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.cpu.total_percent
   902          </code>
   903        </td>
   904        <td>Total CPU resources consumed by the task across all cores</td>
   905        <td>Percentage</td>
   906        <td>Gauge</td>
   907      </tr>
   908      <tr>
   909        <td>
   910          <code>
   911            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.cpu.system
   912          </code>
   913        </td>
   914        <td>Total CPU resources consumed by the task in the system space</td>
   915        <td>Percentage</td>
   916        <td>Gauge</td>
   917      </tr>
   918      <tr>
   919        <td>
   920          <code>
   921            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.cpu.user
   922          </code>
   923        </td>
   924        <td>Total CPU resources consumed by the task in the user space</td>
   925        <td>Percentage</td>
   926        <td>Gauge</td>
   927      </tr>
   928      <tr>
   929        <td>
   930          <code>
   931            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.cpu.throttled_time
   932          </code>
   933        </td>
   934        <td>Total time that the task was throttled</td>
   935        <td>Nanoseconds</td>
   936        <td>Gauge</td>
   937      </tr>
   938      <tr>
   939        <td>
   940          <code>
   941            nomad.client.allocs.&lt;Job&gt;.&lt;TaskGroup&gt;.&lt;AllocID&gt;.&lt;Task&gt;.cpu.total_ticks
   942          </code>
   943        </td>
   944        <td>CPU ticks consumed by the process in the last collection interval</td>
   945        <td>Integer</td>
   946        <td>Gauge</td>
   947      </tr>
   948    </tbody>
   949  </table>
   950  
   951  ## Job Summary Metrics
   952  
   953  Job summary metrics are emitted by the Nomad leader server.
   954  
   955  <table>
   956    <thead>
   957      <tr>
   958        <th>Metric</th>
   959        <th>Description</th>
   960        <th>Unit</th>
   961        <th>Type</th>
   962        <th>Labels</th>
   963      </tr>
   964    </thead>
   965    <tbody>
   966      <tr>
   967        <td>
   968          <code>nomad.job_summary.queued</code>
   969        </td>
   970        <td>Number of queued allocations for a job</td>
   971        <td>Integer</td>
   972        <td>Gauge</td>
   973        <td>job, task_group</td>
   974      </tr>
   975      <tr>
   976        <td>
   977          <code>nomad.job_summary.complete</code>
   978        </td>
   979        <td>Number of complete allocations for a job</td>
   980        <td>Integer</td>
   981        <td>Gauge</td>
   982        <td>job, task_group</td>
   983      </tr>
   984      <tr>
   985        <td>
   986          <code>nomad.job_summary.failed</code>
   987        </td>
   988        <td>Number of failed allocations for a job</td>
   989        <td>Integer</td>
   990        <td>Gauge</td>
   991        <td>job, task_group</td>
   992      </tr>
   993      <tr>
   994        <td>
   995          <code>nomad.job_summary.running</code>
   996        </td>
   997        <td>Number of running allocations for a job</td>
   998        <td>Integer</td>
   999        <td>Gauge</td>
  1000        <td>job, task_group</td>
  1001      </tr>
  1002      <tr>
  1003        <td>
  1004          <code>nomad.job_summary.starting</code>
  1005        </td>
  1006        <td>Number of starting allocations for a job</td>
  1007        <td>Integer</td>
  1008        <td>Gauge</td>
  1009        <td>job, task_group</td>
  1010      </tr>
  1011      <tr>
  1012        <td>
  1013          <code>nomad.job_summary.lost</code>
  1014        </td>
  1015        <td>Number of lost allocations for a job</td>
  1016        <td>Integer</td>
  1017        <td>Gauge</td>
  1018        <td>job, task_group</td>
  1019      </tr>
  1020    </tbody>
  1021  </table>
  1022  
  1023  ## Job Status Metrics
  1024  
  1025  Job status metrics are emitted by the Nomad leader server.
  1026  
  1027  <table>
  1028    <thead>
  1029      <tr>
  1030        <th>Metric</th>
  1031        <th>Description</th>
  1032        <th>Unit</th>
  1033        <th>Type</th>
  1034      </tr>
  1035    </thead>
  1036    <tbody>
  1037      <tr>
  1038        <td>
  1039          <code>nomad.job_status.pending</code>
  1040        </td>
  1041        <td>Number jobs pending</td>
  1042        <td>Integer</td>
  1043        <td>Gauge</td>
  1044      </tr>
  1045      <tr>
  1046        <td>
  1047          <code>nomad.job_status.running</code>
  1048        </td>
  1049        <td>Number jobs running</td>
  1050        <td>Integer</td>
  1051        <td>Gauge</td>
  1052      </tr>
  1053      <tr>
  1054        <td>
  1055          <code>nomad.job_status.dead</code>
  1056        </td>
  1057        <td>Number of dead jobs</td>
  1058        <td>Integer</td>
  1059        <td>Gauge</td>
  1060      </tr>
  1061    </tbody>
  1062  </table>
  1063  
  1064  ## Metric Types
  1065  
  1066  <table>
  1067    <thead>
  1068      <tr>
  1069        <th>Type</th>
  1070        <th>Description</th>
  1071        <th>Quantiles</th>
  1072      </tr>
  1073    </thead>
  1074    <tbody>
  1075      <tr>
  1076        <td>Gauge</td>
  1077        <td>
  1078          Gauge types report an absolute number at the end of the aggregation
  1079          interval
  1080        </td>
  1081        <td>false</td>
  1082      </tr>
  1083      <tr>
  1084        <td>Counter</td>
  1085        <td>
  1086          Counts are incremented and flushed at the end of the aggregation
  1087          interval and then are reset to zero
  1088        </td>
  1089        <td>true</td>
  1090      </tr>
  1091      <tr>
  1092        <td>Timer</td>
  1093        <td>
  1094          Timers measure the time to complete a task and will include quantiles,
  1095          means, standard deviation, etc per interval.
  1096        </td>
  1097        <td>true</td>
  1098      </tr>
  1099    </tbody>
  1100  </table>
  1101  
  1102  ## Tagged Metrics
  1103  
  1104  As of version 0.7, Nomad will start emitting metrics in a tagged format. Each
  1105  metric can support more than one tag, meaning that it is possible to do a
  1106  match over metrics for datapoints such as a particular datacenter, and return
  1107  all metrics with this tag. Nomad supports labels for namespaces as well.
  1108  
  1109  [tagged-metrics]: /docs/telemetry/metrics#tagged-metrics