github.com/smintz/nomad@v0.8.3/website/source/docs/agent/telemetry.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Telemetry"
     4  sidebar_current: "docs-agent-telemetry"
     5  description: |-
     6    Learn about the telemetry data available in Nomad.
     7  ---
     8  
     9  # Telemetry
    10  
    11  The Nomad agent collects various runtime metrics about the performance of
    12  different libraries and subsystems. These metrics are aggregated on a ten
    13  second interval and are retained for one minute.
    14  
    15  This data can be accessed via an HTTP endpoint or via sending a signal to the
    16  Nomad process.
    17  
    18  Via HTTP, as of Nomad version 0.7, this data is available at `/metrics`. See
    19  [Metrics](/api/metrics.html) for more information.
    20  
    21  
    22  To view this data via sending a signal to the Nomad process: on Unix,
    23  this is `USR1` while on Windows it is `BREAK`. Once Nomad receives the signal,
    24  it will dump the current telemetry information to the agent's `stderr`.
    25  
    26  This telemetry information can be used for debugging or otherwise
    27  getting a better view of what Nomad is doing.
    28  
    29  Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite)
    30  as well as statsd based on providing the appropriate configuration options.
    31  
    32  To configure the telemetry output please see the [agent
    33  configuration](/docs/agent/configuration/telemetry.html).
    34  
    35  Below is sample output of a telemetry dump:
    36  
    37  ```text
    38  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
    39  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
    40  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
    41  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
    42  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
    43  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
    44  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
    45  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
    46  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
    47  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
    48  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
    49  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
    50  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
    51  [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
    52  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
    53  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
    54  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
    55  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
    56  [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
    57  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
    58  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
    59  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
    60  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
    61  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
    62  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
    63  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
    64  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
    65  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
    66  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
    67  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
    68  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
    69  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
    70  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
    71  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
    72  [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
    73  ```
    74  
    75  # Key Metrics
    76  
    77  When telemetry is being streamed to statsite or statsd, `interval` is defined to
    78  be their flush interval. Otherwise, the interval can be assumed to be 10 seconds
    79  when retrieving metrics using the above described signals.
    80  
    81  <table class="table table-bordered table-striped">
    82    <tr>
    83      <th>Metric</th>
    84      <th>Description</th>
    85      <th>Unit</th>
    86      <th>Type</th>
    87    </tr>
    88    <tr>
    89      <td>`nomad.runtime.num_goroutines`</td>
    90      <td>Number of goroutines and general load pressure indicator</td>
    91      <td># of goroutines</td>
    92      <td>Gauge</td>
    93    </tr>
    94    <tr>
    95      <td>`nomad.runtime.alloc_bytes`</td>
    96      <td>Memory utilization</td>
    97      <td># of bytes</td>
    98      <td>Gauge</td>
    99    </tr>
   100    <tr>
   101      <td>`nomad.runtime.heap_objects`</td>
   102      <td>Number of objects on the heap. General memory pressure indicator</td>
   103      <td># of heap objects</td>
   104      <td>Gauge</td>
   105    </tr>
   106    <tr>
   107      <td>`nomad.raft.apply`</td>
   108      <td>Number of Raft transactions</td>
   109      <td>Raft transactions / `interval`</td>
   110      <td>Counter</td>
   111    </tr>
   112    <tr>
   113      <td>`nomad.raft.replication.appendEntries`</td>
   114      <td>Raft transaction commit time</td>
   115      <td>ms / Raft Log Append</td>
   116      <td>Timer</td>
   117    </tr>
   118    <tr>
   119      <td>`nomad.raft.leader.lastContact`</td>
   120      <td>Time since last contact to leader. General indicator of Raft latency</td>
   121      <td>ms / Leader Contact</td>
   122      <td>Timer</td>
   123    </tr>
   124    <tr>
   125      <td>`nomad.broker.total_ready`</td>
   126      <td>Number of evaluations ready to be processed</td>
   127      <td># of evaluations</td>
   128      <td>Gauge</td>
   129    </tr>
   130    <tr>
   131      <td>`nomad.broker.total_unacked`</td>
   132      <td>Evaluations dispatched for processing but incomplete</td>
   133      <td># of evaluations</td>
   134      <td>Gauge</td>
   135    </tr>
   136    <tr>
   137      <td>`nomad.broker.total_blocked`</td>
   138      <td>
   139          Evaluations that are blocked until an existing evaluation for the same job
   140          completes
   141      </td>
   142      <td># of evaluations</td>
   143      <td>Gauge</td>
   144    </tr>
   145    <tr>
   146      <td>`nomad.plan.queue_depth`</td>
   147      <td>Number of scheduler Plans waiting to be evaluated</td>
   148      <td># of plans</td>
   149      <td>Gauge</td>
   150    </tr>
   151    <tr>
   152      <td>`nomad.plan.submit`</td>
   153      <td>
   154          Time to submit a scheduler Plan. Higher values cause lower scheduling
   155          throughput
   156      </td>
   157      <td>ms / Plan Submit</td>
   158      <td>Timer</td>
   159    </tr>
   160    <tr>
   161      <td>`nomad.plan.evaluate`</td>
   162      <td>
   163          Time to validate a scheduler Plan. Higher values cause lower scheduling
   164          throughput. Similar to `nomad.plan.submit` but does not include RPC time
   165          or time in the Plan Queue
   166      </td>
   167      <td>ms / Plan Evaluation</td>
   168      <td>Timer</td>
   169    </tr>
   170    <tr>
   171      <td>`nomad.worker.invoke_scheduler.<type>`</td>
   172      <td>Time to run the scheduler of the given type</td>
   173      <td>ms / Scheduler Run</td>
   174      <td>Timer</td>
   175    </tr>
   176    <tr>
   177      <td>`nomad.worker.wait_for_index`</td>
   178      <td>
   179          Time waiting for Raft log replication from leader. High delays result in
   180          lower scheduling throughput
   181      </td>
   182      <td>ms / Raft Index Wait</td>
   183      <td>Timer</td>
   184    </tr>
   185    <tr>
   186      <td>`nomad.heartbeat.active`</td>
   187      <td>
   188          Number of active heartbeat timers. Each timer represents a Nomad Client
   189          connection
   190      </td>
   191      <td># of heartbeat timers</td>
   192      <td>Gauge</td>
   193    </tr>
   194    <tr>
   195      <td>`nomad.heartbeat.invalidate`</td>
   196      <td>
   197          The length of time it takes to invalidate a Nomad Client due to failed
   198          heartbeats
   199      </td>
   200      <td>ms / Heartbeat Invalidation</td>
   201      <td>Timer</td>
   202    </tr>
   203    <tr>
   204      <td>`nomad.rpc.query`</td>
   205      <td>Number of RPC queries</td>
   206      <td>RPC Queries / `interval`</td>
   207      <td>Counter</td>
   208    </tr>
   209    <tr>
   210      <td>`nomad.rpc.request`</td>
   211      <td>Number of RPC requests being handled</td>
   212      <td>RPC Requests / `interval`</td>
   213      <td>Counter</td>
   214    </tr>
   215    <tr>
   216      <td>`nomad.rpc.request_error`</td>
   217      <td>Number of RPC requests being handled that result in an error</td>
   218      <td>RPC Errors / `interval`</td>
   219      <td>Counter</td>
   220    </tr>
   221  </table>
   222  
   223  # Client Metrics
   224  
   225  The Nomad client emits metrics related to the resource usage of the allocations
   226  and tasks running on it and the node itself.  Operators have to explicitly turn
   227  on publishing host and allocation metrics. Publishing allocation and host
   228  metrics can be turned on by setting the value of `publish_allocation_metrics`
   229  `publish_node_metrics` to `true`.
   230  
   231  
   232  By default the collection interval is 1 second but it can be changed by the
   233  changing the value of the `collection_interval` key in the `telemetry`
   234  configuration block.
   235  
   236  Please see the [agent configuration](/docs/agent/configuration/telemetry.html)
   237  page for more details.
   238  
   239  ## Host Metrics (post Nomad version 0.7)
   240  
   241  Starting in version 0.7, Nomad will emit tagged metrics, in the below format:
   242  
   243  <table class="table table-bordered table-striped">
   244    <tr>
   245      <th>Metric</th>
   246      <th>Description</th>
   247      <th>Unit</th>
   248      <th>Type</th>
   249      <th>Labels</th>
   250    </tr>
   251    <tr>
   252      <td>`nomad.client.allocated.cpu`</td>
   253      <td>Total amount of CPU shares the scheduler has allocated to tasks</td>
   254      <td>MHz</td>
   255      <td>Gauge</td>
   256      <td>node_id, datacenter</td>
   257    </tr>
   258    <tr>
   259      <td>`nomad.client.unallocated.cpu`</td>
   260      <td>Total amount of CPU shares free for the scheduler to allocate to tasks</td>
   261      <td>MHz</td>
   262      <td>Gauge</td>
   263      <td>node_id, datacenter</td>
   264    </tr>
   265    <tr>
   266      <td>`nomad.client.allocated.memory`</td>
   267      <td>Total amount of memory the scheduler has allocated to tasks</td>
   268      <td>Megabytes</td>
   269      <td>Gauge</td>
   270      <td>node_id, datacenter</td>
   271    </tr>
   272    <tr>
   273      <td>`nomad.client.unallocated.memory`</td>
   274      <td>Total amount of memory free for the scheduler to allocate to tasks</td>
   275      <td>Megabytes</td>
   276      <td>Gauge</td>
   277      <td>node_id, datacenter</td>
   278    </tr>
   279    <tr>
   280      <td>`nomad.client.allocated.disk`</td>
   281      <td>Total amount of disk space the scheduler has allocated to tasks</td>
   282      <td>Megabytes</td>
   283      <td>Gauge</td>
   284      <td>node_id, datacenter</td>
   285    </tr>
   286    <tr>
   287      <td>`nomad.client.unallocated.disk`</td>
   288      <td>Total amount of disk space free for the scheduler to allocate to tasks</td>
   289      <td>Megabytes</td>
   290      <td>Gauge</td>
   291      <td>node_id, datacenter</td>
   292    </tr>
   293    <tr>
   294      <td>`nomad.client.allocated.iops`</td>
   295      <td>Total amount of IOPS the scheduler has allocated to tasks</td>
   296      <td>IOPS</td>
   297      <td>Gauge</td>
   298      <td>node_id, datacenter</td>
   299    </tr>
   300    <tr>
   301      <td>`nomad.client.unallocated.iops`</td>
   302      <td>Total amount of IOPS free for the scheduler to allocate to tasks</td>
   303      <td>IOPS</td>
   304      <td>Gauge</td>
   305      <td>node_id, datacenter</td>
   306    </tr>
   307    <tr>
   308      <td>`nomad.client.allocated.network`</td>
   309      <td>Total amount of bandwidth the scheduler has allocated to tasks on the
   310      given device</td>
   311      <td>Megabits</td>
   312      <td>Gauge</td>
   313      <td>node_id, datacenter, device</td>
   314    </tr>
   315    <tr>
   316      <td>`nomad.client.unallocated.network`</td>
   317      <td>Total amount of bandwidth free for the scheduler to allocate to tasks on
   318      the given device</td>
   319      <td>Megabits</td>
   320      <td>Gauge</td>
   321      <td>node_id, datacenter, device</td>
   322    </tr>
   323    <tr>
   324      <td>`nomad.client.host.memory.total`</td>
   325      <td>Total amount of physical memory on the node</td>
   326      <td>Bytes</td>
   327      <td>Gauge</td>
   328      <td>node_id, datacenter</td>
   329    </tr>
   330    <tr>
   331      <td>`nomad.client.host.memory.available`</td>
   332      <td>Total amount of memory available to processes which includes free and
   333      cached memory</td>
   334      <td>Bytes</td>
   335      <td>Gauge</td>
   336      <td>node_id, datacenter</td>
   337    </tr>
   338    <tr>
   339      <td>`nomad.client.host.memory.used`</td>
   340      <td>Amount of memory used by processes</td>
   341      <td>Bytes</td>
   342      <td>Gauge</td>
   343      <td>node_id, datacenter</td>
   344    </tr>
   345    <tr>
   346      <td>`nomad.client.host.memory.free`</td>
   347      <td>Amount of memory which is free</td>
   348      <td>Bytes</td>
   349      <td>Gauge</td>
   350      <td>node_id, datacenter</td>
   351    </tr>
   352    <tr>
   353      <td>`nomad.client.uptime`</td>
   354      <td>Uptime of the host running the Nomad client</td>
   355      <td>Seconds</td>
   356      <td>Gauge</td>
   357      <td>node_id, datacenter</td>
   358    </tr>
   359    <tr>
   360      <td>`nomad.client.host.cpu.total`</td>
   361      <td>Total CPU utilization</td>
   362      <td>Percentage</td>
   363      <td>Gauge</td>
   364      <td>node_id, datacenter, cpu</td>
   365    </tr>
   366    <tr>
   367      <td>`nomad.client.host.cpu.user`</td>
   368      <td>CPU utilization in the user space</td>
   369      <td>Percentage</td>
   370      <td>Gauge</td>
   371      <td>node_id, datacenter, cpu</td>
   372    </tr>
   373    <tr>
   374      <td>`nomad.client.host.cpu.system`</td>
   375      <td>CPU utilization in the system space</td>
   376      <td>Percentage</td>
   377      <td>Gauge</td>
   378      <td>node_id, datacenter, cpu</td>
   379    </tr>
   380    <tr>
   381      <td>`nomad.client.host.cpu.idle`</td>
   382      <td>Idle time spent by the CPU</td>
   383      <td>Percentage</td>
   384      <td>Gauge</td>
   385      <td>node_id, datacenter, cpu</td>
   386    </tr>
   387    <tr>
   388      <td>`nomad.client.host.disk.size`</td>
   389      <td>Total size of the device</td>
   390      <td>Bytes</td>
   391      <td>Gauge</td>
   392      <td>node_id, datacenter, disk</td>
   393    </tr>
   394    <tr>
   395      <td>`nomad.client.host.disk.used`</td>
   396      <td>Amount of space which has been used</td>
   397      <td>Bytes</td>
   398      <td>Gauge</td>
   399      <td>node_id, datacenter, disk</td>
   400    </tr>
   401    <tr>
   402      <td>`nomad.client.host.disk.available`</td>
   403      <td>Amount of space which is available</td>
   404      <td>Bytes</td>
   405      <td>Gauge</td>
   406      <td>node_id, datacenter, disk</td>
   407    </tr>
   408    <tr>
   409      <td>`nomad.client.host.disk.used_percent`</td>
   410      <td>Percentage of disk space used</td>
   411      <td>Percentage</td>
   412      <td>Gauge</td>
   413      <td>node_id, datacenter, disk</td>
   414    </tr>
   415    <tr>
   416      <td>`nomad.client.host.disk.inodes_percent`</td>
   417      <td>Disk space consumed by the inodes</td>
   418      <td>Percent</td>
   419      <td>Gauge</td>
   420      <td>node_id, datacenter, disk</td>
   421    </tr>
   422    <tr>
   423      <td>`nomad.client.allocs.start`</td>
   424      <td>Number of allocations starting</td>
   425      <td>Integer</td>
   426      <td>Counter</td>
   427      <td>node_id, job, task_group</td>
   428    </tr>
   429    <tr>
   430      <td>`nomad.client.allocs.running`</td>
   431      <td>Number of allocations starting to run</td>
   432      <td>Integer</td>
   433      <td>Counter</td>
   434      <td>node_id, job, task_group</td>
   435    </tr>
   436    <tr>
   437      <td>`nomad.client.allocs.failed`</td>
   438      <td>Number of allocations failing</td>
   439      <td>Integer</td>
   440      <td>Counter</td>
   441      <td>node_id, job, task_group</td>
   442    </tr>
   443    <tr>
   444      <td>`nomad.client.allocs.restart`</td>
   445      <td>Number of allocations restarting</td>
   446      <td>Integer</td>
   447      <td>Counter</td>
   448      <td>node_id, job, task_group</td>
   449    </tr>
   450    <tr>
   451      <td>`nomad.client.allocs.complete`</td>
   452      <td>Number of allocations completing</td>
   453      <td>Integer</td>
   454      <td>Counter</td>
   455      <td>node_id, job, task_group</td>
   456    </tr>
   457    <tr>
   458      <td>`nomad.client.allocs.destroy`</td>
   459      <td>Number of allocations being destroyed</td>
   460      <td>Integer</td>
   461      <td>Counter</td>
   462      <td>node_id, job, task_group</td>
   463    </tr>
   464  </table>
   465  
   466  ## Host Metrics (deprecated post Nomad 0.7)
   467  
   468  The below are metrics emitted by Nomad in versions prior to 0.7. These metrics
   469  can be emitted in the below format post-0.7 (as well as the new format,
   470  detailed above) but any new metrics will only be available in the new format.
   471  
   472  <table class="table table-bordered table-striped">
   473    <tr>
   474      <th>Metric</th>
   475      <th>Description</th>
   476      <th>Unit</th>
   477      <th>Type</th>
   478    </tr>
   479    <tr>
   480      <td>`nomad.client.allocated.cpu.<HostID>`</td>
   481      <td>Total amount of CPU shares the scheduler has allocated to tasks</td>
   482      <td>MHz</td>
   483      <td>Gauge</td>
   484    </tr>
   485    <tr>
   486      <td>`nomad.client.unallocated.cpu.<HostID>`</td>
   487      <td>Total amount of CPU shares free for the scheduler to allocate to tasks</td>
   488      <td>MHz</td>
   489      <td>Gauge</td>
   490    </tr>
   491    <tr>
   492      <td>`nomad.client.allocated.memory.<HostID>`</td>
   493      <td>Total amount of memory the scheduler has allocated to tasks</td>
   494      <td>Megabytes</td>
   495      <td>Gauge</td>
   496    </tr>
   497    <tr>
   498      <td>`nomad.client.unallocated.memory.<HostID>`</td>
   499      <td>Total amount of memory free for the scheduler to allocate to tasks</td>
   500      <td>Megabytes</td>
   501      <td>Gauge</td>
   502    </tr>
   503    <tr>
   504      <td>`nomad.client.allocated.disk.<HostID>`</td>
   505      <td>Total amount of disk space the scheduler has allocated to tasks</td>
   506      <td>Megabytes</td>
   507      <td>Gauge</td>
   508    </tr>
   509    <tr>
   510      <td>`nomad.client.unallocated.disk.<HostID>`</td>
   511      <td>Total amount of disk space free for the scheduler to allocate to tasks</td>
   512      <td>Megabytes</td>
   513      <td>Gauge</td>
   514    </tr>
   515    <tr>
   516      <td>`nomad.client.allocated.iops.<HostID>`</td>
   517      <td>Total amount of IOPS the scheduler has allocated to tasks</td>
   518      <td>IOPS</td>
   519      <td>Gauge</td>
   520    </tr>
   521    <tr>
   522      <td>`nomad.client.unallocated.iops.<HostID>`</td>
   523      <td>Total amount of IOPS free for the scheduler to allocate to tasks</td>
   524      <td>IOPS</td>
   525      <td>Gauge</td>
   526    </tr>
   527    <tr>
   528      <td>`nomad.client.allocated.network.<Device-Name>.<HostID>`</td>
   529      <td>Total amount of bandwidth the scheduler has allocated to tasks on the
   530      given device</td>
   531      <td>Megabits</td>
   532      <td>Gauge</td>
   533    </tr>
   534    <tr>
   535      <td>`nomad.client.unallocated.network.<Device-Name>.<HostID>`</td>
   536      <td>Total amount of bandwidth free for the scheduler to allocate to tasks on
   537      the given device</td>
   538      <td>Megabits</td>
   539      <td>Gauge</td>
   540    </tr>
   541    <tr>
   542      <td>`nomad.client.host.memory.<HostID>.total`</td>
   543      <td>Total amount of physical memory on the node</td>
   544      <td>Bytes</td>
   545      <td>Gauge</td>
   546    </tr>
   547    <tr>
   548      <td>`nomad.client.host.memory.<HostID>.available`</td>
   549      <td>Total amount of memory available to processes which includes free and
   550      cached memory</td>
   551      <td>Bytes</td>
   552      <td>Gauge</td>
   553    </tr>
   554    <tr>
   555      <td>`nomad.client.host.memory.<HostID>.used`</td>
   556      <td>Amount of memory used by processes</td>
   557      <td>Bytes</td>
   558      <td>Gauge</td>
   559    </tr>
   560    <tr>
   561      <td>`nomad.client.host.memory.<HostID>.free`</td>
   562      <td>Amount of memory which is free</td>
   563      <td>Bytes</td>
   564      <td>Gauge</td>
   565    </tr>
   566    <tr>
   567      <td>`nomad.client.uptime.<HostID>`</td>
   568      <td>Uptime of the host running the Nomad client</td>
   569      <td>Seconds</td>
   570      <td>Gauge</td>
   571    </tr>
   572    <tr>
   573      <td>`nomad.client.host.cpu.<HostID>.<CPU-Core>.total`</td>
   574      <td>Total CPU utilization</td>
   575      <td>Percentage</td>
   576      <td>Gauge</td>
   577    </tr>
   578    <tr>
   579      <td>`nomad.client.host.cpu.<HostID>.<CPU-Core>.user`</td>
   580      <td>CPU utilization in the user space</td>
   581      <td>Percentage</td>
   582      <td>Gauge</td>
   583    </tr>
   584    <tr>
   585      <td>`nomad.client.host.cpu.<HostID>.<CPU-Core>.system`</td>
   586      <td>CPU utilization in the system space</td>
   587      <td>Percentage</td>
   588      <td>Gauge</td>
   589    </tr>
   590    <tr>
   591      <td>`nomad.client.host.cpu.<HostID>.<CPU-Core>.idle`</td>
   592      <td>Idle time spent by the CPU</td>
   593      <td>Percentage</td>
   594      <td>Gauge</td>
   595    </tr>
   596    <tr>
   597      <td>`nomad.client.host.disk.<HostID>.<Device-Name>.size`</td>
   598      <td>Total size of the device</td>
   599      <td>Bytes</td>
   600      <td>Gauge</td>
   601    </tr>
   602    <tr>
   603      <td>`nomad.client.host.disk.<HostID>.<Device-Name>.used`</td>
   604      <td>Amount of space which has been used</td>
   605      <td>Bytes</td>
   606      <td>Gauge</td>
   607    </tr>
   608    <tr>
   609      <td>`nomad.client.host.disk.<HostID>.<Device-Name>.available`</td>
   610      <td>Amount of space which is available</td>
   611      <td>Bytes</td>
   612      <td>Gauge</td>
   613    </tr>
   614    <tr>
   615      <td>`nomad.client.host.disk.<HostID>.<Device-Name>.used_percent`</td>
   616      <td>Percentage of disk space used</td>
   617      <td>Percentage</td>
   618      <td>Gauge</td>
   619    </tr>
   620    <tr>
   621      <td>`nomad.client.host.disk.<HostID>.<Device-Name>.inodes_percent`</td>
   622      <td>Disk space consumed by the inodes</td>
   623      <td>Percent</td>
   624      <td>Gauge</td>
   625    </tr>
   626  </table>
   627  
   628  ## Allocation Metrics
   629  
   630  <table class="table table-bordered table-striped">
   631    <tr>
   632      <th>Metric</th>
   633      <th>Description</th>
   634      <th>Unit</th>
   635      <th>Type</th>
   636    </tr>
   637    <tr>
   638      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.rss`</td>
   639      <td>Amount of RSS memory consumed by the task</td>
   640      <td>Bytes</td>
   641      <td>Gauge</td>
   642    </tr>
   643    <tr>
   644      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.cache`</td>
   645      <td>Amount of memory cached by the task</td>
   646      <td>Bytes</td>
   647      <td>Gauge</td>
   648    </tr>
   649    <tr>
   650      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.swap`</td>
   651      <td>Amount of memory swapped by the task</td>
   652      <td>Bytes</td>
   653      <td>Gauge</td>
   654    </tr>
   655    <tr>
   656      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.max_usage`</td>
   657      <td>Maximum amount of memory ever used by the task</td>
   658      <td>Bytes</td>
   659      <td>Gauge</td>
   660    </tr>
   661    <tr>
   662      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.kernel_usage`</td>
   663      <td>Amount of memory used by the kernel for this task</td>
   664      <td>Bytes</td>
   665      <td>Gauge</td>
   666    </tr>
   667    <tr>
   668      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.kernel_max_usage`</td>
   669      <td>Maximum amount of memory ever used by the kernel for this task</td>
   670      <td>Bytes</td>
   671      <td>Gauge</td>
   672    </tr>
   673    <tr>
   674      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.total_percent`</td>
   675      <td>Total CPU resources consumed by the task across all cores</td>
   676      <td>Percentage</td>
   677      <td>Gauge</td>
   678    </tr>
   679    <tr>
   680      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.system`</td>
   681      <td>Total CPU resources consumed by the task in the system space</td>
   682      <td>Percentage</td>
   683      <td>Gauge</td>
   684    </tr>
   685    <tr>
   686      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.user`</td>
   687      <td>Total CPU resources consumed by the task in the user space</td>
   688      <td>Percentage</td>
   689      <td>Gauge</td>
   690    </tr>
   691    <tr>
   692      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.throttled_time`</td>
   693      <td>Total time that the task was throttled</td>
   694      <td>Nanoseconds</td>
   695      <td>Gauge</td>
   696    </tr>
   697    <tr>
   698      <td>`nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.total_ticks`</td>
   699      <td>CPU ticks consumed by the process in the last collection interval</td>
   700      <td>Integer</td>
   701      <td>Gauge</td>
   702    </tr>
   703  </table>
   704  
   705  # Job Metrics
   706  
   707  Job metrics are emitted by the Nomad leader server.
   708  
   709  <table class="table table-bordered table-striped">
   710    <tr>
   711      <th>Metric</th>
   712      <th>Description</th>
   713      <th>Unit</th>
   714      <th>Type</th>
   715      <th>Labels</th>
   716    </tr>
   717    <tr>
   718      <td>`nomad.job_summary.queued`</td>
   719      <td>Number of queued allocations for a job</td>
   720      <td>Integer</td>
   721      <td>Gauge</td>
   722      <td>job, task_group</td>
   723    </tr>
   724    <tr>
   725      <td>`nomad.job_summary.complete`</td>
   726      <td>Number of complete allocations for a job</td>
   727      <td>Integer</td>
   728      <td>Gauge</td>
   729      <td>job, task_group</td>
   730    </tr>
   731    <tr>
   732      <td>`nomad.job_summary.failed`</td>
   733      <td>Number of failed allocations for a job</td>
   734      <td>Integer</td>
   735      <td>Gauge</td>
   736      <td>job, task_group</td>
   737    </tr>
   738    <tr>
   739      <td>`nomad.job_summary.running`</td>
   740      <td>Number of running allocations for a job</td>
   741      <td>Integer</td>
   742      <td>Gauge</td>
   743      <td>job, task_group</td>
   744    </tr>
   745    <tr>
   746      <td>`nomad.job_summary.starting`</td>
   747      <td>Number of starting allocations for a job</td>
   748      <td>Integer</td>
   749      <td>Gauge</td>
   750      <td>job, task_group</td>
   751    </tr>
   752    <tr>
   753      <td>`nomad.job_summary.lost`</td>
   754      <td>Number of lost allocations for a job</td>
   755      <td>Integer</td>
   756      <td>Gauge</td>
   757      <td>job, task_group</td>
   758    </tr>
   759  </table>
   760  
   761  # Metric Types
   762  
   763  <table class="table table-bordered table-striped">
   764    <tr>
   765      <th>Type</th>
   766      <th>Description</th>
   767      <th>Quantiles</th>
   768    </tr>
   769    <tr>
   770      <td>Gauge</td>
   771      <td>
   772          Gauge types report an absolute number at the end of the aggregation
   773          interval
   774      </td>
   775      <td>false</td>
   776    </tr>
   777    <tr>
   778      <td>Counter</td>
   779      <td>
   780          Counts are incremented and flushed at the end of the aggregation
   781          interval and then are reset to zero
   782      </td>
   783      <td>true</td>
   784    </tr>
   785    <tr>
   786      <td>Timer</td>
   787      <td>
   788          Timers measure the time to complete a task and will include quantiles,
   789          means, standard deviation, etc per interval.
   790      </td>
   791      <td>true</td>
   792    </tr>
   793  </table>