github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/telemetry/metrics.mdx (about) 1 --- 2 layout: docs 3 page_title: Metrics 4 sidebar_title: Metrics 5 description: Learn about the different metrics available in Nomad. 6 --- 7 8 # Metrics 9 10 The Nomad agent collects various runtime metrics about the performance of 11 different libraries and subsystems. These metrics are aggregated on a ten 12 second interval and are retained for one minute. 13 14 This data can be accessed via an HTTP endpoint or via sending a signal to the 15 Nomad process. 16 17 As of Nomad version 0.7, this data is available via HTTP at `/metrics`. See 18 [Metrics](/api-docs/metrics) for more information. 19 20 To view this data via sending a signal to the Nomad process: on Unix, 21 this is `USR1` while on Windows it is `BREAK`. Once Nomad receives the signal, 22 it will dump the current telemetry information to the agent's `stderr`. 23 24 This telemetry information can be used for debugging or otherwise 25 getting a better view of what Nomad is doing. 26 27 Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite) 28 as well as statsd based on providing the appropriate configuration options. 29 30 To configure the telemetry output please see the [agent 31 configuration](/docs/configuration/telemetry). 32 33 Below is sample output of a telemetry dump: 34 35 ```text 36 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000 37 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000 38 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000 39 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000 40 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000 41 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000 42 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000 43 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000 44 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000 45 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000 46 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000 47 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000 48 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000 49 [2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000 50 [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000 51 [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000 52 [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000 53 [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000 54 [2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000 55 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000 56 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296 57 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000 58 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000 59 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054 60 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007 61 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025 62 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306 63 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110 64 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354 65 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000 66 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110 67 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071 68 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626 69 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813 70 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204 71 ``` 72 73 ## Key Metrics 74 75 When telemetry is being streamed to statsite or statsd, `interval` is defined to 76 be their flush interval. Otherwise, the interval can be assumed to be 10 seconds 77 when retrieving metrics using the above described signals. 78 79 <table> 80 <thead> 81 <tr> 82 <th>Metric</th> 83 <th>Description</th> 84 <th>Unit</th> 85 <th>Type</th> 86 </tr> 87 </thead> 88 <tbody> 89 <tr> 90 <td> 91 <code>nomad.runtime.num_goroutines</code> 92 </td> 93 <td>Number of goroutines and general load pressure indicator</td> 94 <td># of goroutines</td> 95 <td>Gauge</td> 96 </tr> 97 <tr> 98 <td> 99 <code>nomad.runtime.alloc_bytes</code> 100 </td> 101 <td>Memory utilization</td> 102 <td># of bytes</td> 103 <td>Gauge</td> 104 </tr> 105 <tr> 106 <td> 107 <code>nomad.runtime.heap_objects</code> 108 </td> 109 <td>Number of objects on the heap. General memory pressure indicator</td> 110 <td># of heap objects</td> 111 <td>Gauge</td> 112 </tr> 113 <tr> 114 <td> 115 <code>nomad.raft.apply</code> 116 </td> 117 <td>Number of Raft transactions</td> 118 <td>Raft transactions / `interval`</td> 119 <td>Counter</td> 120 </tr> 121 <tr> 122 <td> 123 <code>nomad.raft.replication.appendEntries</code> 124 </td> 125 <td>Raft transaction commit time</td> 126 <td>ms / Raft Log Append</td> 127 <td>Timer</td> 128 </tr> 129 <tr> 130 <td> 131 <code>nomad.raft.leader.lastContact</code> 132 </td> 133 <td> 134 Time since last contact to leader. General indicator of Raft latency 135 </td> 136 <td>ms / Leader Contact</td> 137 <td>Timer</td> 138 </tr> 139 <tr> 140 <td> 141 <code>nomad.broker.total_ready</code> 142 </td> 143 <td>Number of evaluations ready to be processed</td> 144 <td># of evaluations</td> 145 <td>Gauge</td> 146 </tr> 147 <tr> 148 <td> 149 <code>nomad.broker.total_unacked</code> 150 </td> 151 <td>Evaluations dispatched for processing but incomplete</td> 152 <td># of evaluations</td> 153 <td>Gauge</td> 154 </tr> 155 <tr> 156 <td> 157 <code>nomad.broker.total_blocked</code> 158 </td> 159 <td> 160 Evaluations that are blocked until an existing evaluation for the same 161 job completes 162 </td> 163 <td># of evaluations</td> 164 <td>Gauge</td> 165 </tr> 166 <tr> 167 <td> 168 <code>nomad.plan.queue_depth</code> 169 </td> 170 <td>Number of scheduler Plans waiting to be evaluated</td> 171 <td># of plans</td> 172 <td>Gauge</td> 173 </tr> 174 <tr> 175 <td> 176 <code>nomad.plan.submit</code> 177 </td> 178 <td> 179 Time to submit a scheduler Plan. Higher values cause lower scheduling 180 throughput 181 </td> 182 <td>ms / Plan Submit</td> 183 <td>Timer</td> 184 </tr> 185 <tr> 186 <td> 187 <code>nomad.plan.evaluate</code> 188 </td> 189 <td> 190 Time to validate a scheduler Plan. Higher values cause lower scheduling 191 throughput. Similar to <code>nomad.plan.submit</code> but does not 192 include RPC time or time in the Plan Queue 193 </td> 194 <td>ms / Plan Evaluation</td> 195 <td>Timer</td> 196 </tr> 197 <tr> 198 <td> 199 <code>nomad.worker.invoke_scheduler.<type></code> 200 </td> 201 <td>Time to run the scheduler of the given type</td> 202 <td>ms / Scheduler Run</td> 203 <td>Timer</td> 204 </tr> 205 <tr> 206 <td> 207 <code>nomad.worker.wait_for_index</code> 208 </td> 209 <td> 210 Time waiting for Raft log replication from leader. High delays result in 211 lower scheduling throughput 212 </td> 213 <td>ms / Raft Index Wait</td> 214 <td>Timer</td> 215 </tr> 216 <tr> 217 <td> 218 <code>nomad.heartbeat.active</code> 219 </td> 220 <td> 221 Number of active heartbeat timers. Each timer represents a Nomad Client 222 connection 223 </td> 224 <td># of heartbeat timers</td> 225 <td>Gauge</td> 226 </tr> 227 <tr> 228 <td> 229 <code>nomad.heartbeat.invalidate</code> 230 </td> 231 <td> 232 The length of time it takes to invalidate a Nomad Client due to failed 233 heartbeats 234 </td> 235 <td>ms / Heartbeat Invalidation</td> 236 <td>Timer</td> 237 </tr> 238 <tr> 239 <td> 240 <code>nomad.rpc.query</code> 241 </td> 242 <td>Number of RPC queries</td> 243 <td>RPC Queries / `interval`</td> 244 <td>Counter</td> 245 </tr> 246 <tr> 247 <td> 248 <code>nomad.rpc.request</code> 249 </td> 250 <td>Number of RPC requests being handled</td> 251 <td>RPC Requests / `interval`</td> 252 <td>Counter</td> 253 </tr> 254 <tr> 255 <td> 256 <code>nomad.rpc.request_error</code> 257 </td> 258 <td>Number of RPC requests being handled that result in an error</td> 259 <td>RPC Errors / `interval`</td> 260 <td>Counter</td> 261 </tr> 262 </tbody> 263 </table> 264 265 ## Client Metrics 266 267 The Nomad client emits metrics related to the resource usage of the allocations 268 and tasks running on it and the node itself. Operators have to explicitly turn 269 on publishing host and allocation metrics. Publishing allocation and host 270 metrics can be turned on by setting the value of `publish_allocation_metrics` 271 `publish_node_metrics` to `true`. 272 273 By default the collection interval is 1 second but it can be changed by the 274 changing the value of the `collection_interval` key in the `telemetry` 275 configuration block. 276 277 Please see the [agent configuration](/docs/configuration/telemetry) 278 page for more details. 279 280 As of Nomad 0.9, Nomad will emit additional labels for [parameterized](/docs/job-specification/parameterized) and 281 [periodic](/docs/job-specification/parameterized) jobs. Nomad 282 emits the parent job id as a new label `parent_id`. Also, the labels `dispatch_id` 283 and `periodic_id` are emitted, containing the ID of the specific invocation of the 284 parameterized or periodic job respectively. For example, a dispatch job with the id 285 `myjob/dispatch-1312323423423`, will have the following labels. 286 287 <table> 288 <thead> 289 <tr> 290 <th>Label</th> 291 <th>Value</th> 292 </tr> 293 </thead> 294 <tbody> 295 <tr> 296 <td>job</td> 297 <td> 298 <code>myjob/dispatch-1312323423423</code> 299 </td> 300 </tr> 301 <tr> 302 <td>parent_id</td> 303 <td>myjob</td> 304 </tr> 305 <tr> 306 <td>dispatch_id</td> 307 <td>1312323423423</td> 308 </tr> 309 </tbody> 310 </table> 311 312 ## Host Metrics (post Nomad version 0.7) 313 314 Starting in version 0.7, Nomad will emit [tagged metrics][tagged-metrics], in the below format: 315 316 <table> 317 <thead> 318 <tr> 319 <th>Metric</th> 320 <th>Description</th> 321 <th>Unit</th> 322 <th>Type</th> 323 <th>Labels</th> 324 </tr> 325 </thead> 326 <tbody> 327 <tr> 328 <td> 329 <code>nomad.client.allocated.cpu</code> 330 </td> 331 <td>Total amount of CPU shares the scheduler has allocated to tasks</td> 332 <td>MHz</td> 333 <td>Gauge</td> 334 <td>node_id, datacenter</td> 335 </tr> 336 <tr> 337 <td> 338 <code>nomad.client.unallocated.cpu</code> 339 </td> 340 <td> 341 Total amount of CPU shares free for the scheduler to allocate to tasks 342 </td> 343 <td>MHz</td> 344 <td>Gauge</td> 345 <td>node_id, datacenter</td> 346 </tr> 347 <tr> 348 <td> 349 <code>nomad.client.allocated.memory</code> 350 </td> 351 <td>Total amount of memory the scheduler has allocated to tasks</td> 352 <td>Megabytes</td> 353 <td>Gauge</td> 354 <td>node_id, datacenter</td> 355 </tr> 356 <tr> 357 <td> 358 <code>nomad.client.unallocated.memory</code> 359 </td> 360 <td> 361 Total amount of memory free for the scheduler to allocate to tasks 362 </td> 363 <td>Megabytes</td> 364 <td>Gauge</td> 365 <td>node_id, datacenter</td> 366 </tr> 367 <tr> 368 <td> 369 <code>nomad.client.allocated.disk</code> 370 </td> 371 <td>Total amount of disk space the scheduler has allocated to tasks</td> 372 <td>Megabytes</td> 373 <td>Gauge</td> 374 <td>node_id, datacenter</td> 375 </tr> 376 <tr> 377 <td> 378 <code>nomad.client.unallocated.disk</code> 379 </td> 380 <td> 381 Total amount of disk space free for the scheduler to allocate to tasks 382 </td> 383 <td>Megabytes</td> 384 <td>Gauge</td> 385 <td>node_id, datacenter</td> 386 </tr> 387 <tr> 388 <td> 389 <code>nomad.client.allocated.network</code> 390 </td> 391 <td> 392 Total amount of bandwidth the scheduler has allocated to tasks on the 393 given device 394 </td> 395 <td>Megabits</td> 396 <td>Gauge</td> 397 <td>node_id, datacenter, device</td> 398 </tr> 399 <tr> 400 <td> 401 <code>nomad.client.unallocated.network</code> 402 </td> 403 <td> 404 Total amount of bandwidth free for the scheduler to allocate to tasks on 405 the given device 406 </td> 407 <td>Megabits</td> 408 <td>Gauge</td> 409 <td>node_id, datacenter, device</td> 410 </tr> 411 <tr> 412 <td> 413 <code>nomad.client.host.memory.total</code> 414 </td> 415 <td>Total amount of physical memory on the node</td> 416 <td>Bytes</td> 417 <td>Gauge</td> 418 <td>node_id, datacenter</td> 419 </tr> 420 <tr> 421 <td> 422 <code>nomad.client.host.memory.available</code> 423 </td> 424 <td> 425 Total amount of memory available to processes which includes free and 426 cached memory 427 </td> 428 <td>Bytes</td> 429 <td>Gauge</td> 430 <td>node_id, datacenter</td> 431 </tr> 432 <tr> 433 <td> 434 <code>nomad.client.host.memory.used</code> 435 </td> 436 <td>Amount of memory used by processes</td> 437 <td>Bytes</td> 438 <td>Gauge</td> 439 <td>node_id, datacenter</td> 440 </tr> 441 <tr> 442 <td> 443 <code>nomad.client.host.memory.free</code> 444 </td> 445 <td>Amount of memory which is free</td> 446 <td>Bytes</td> 447 <td>Gauge</td> 448 <td>node_id, datacenter</td> 449 </tr> 450 <tr> 451 <td> 452 <code>nomad.client.uptime</code> 453 </td> 454 <td>Uptime of the host running the Nomad client</td> 455 <td>Seconds</td> 456 <td>Gauge</td> 457 <td>node_id, datacenter</td> 458 </tr> 459 <tr> 460 <td> 461 <code>nomad.client.host.cpu.total</code> 462 </td> 463 <td>Total CPU utilization</td> 464 <td>Percentage</td> 465 <td>Gauge</td> 466 <td>node_id, datacenter, cpu</td> 467 </tr> 468 <tr> 469 <td> 470 <code>nomad.client.host.cpu.user</code> 471 </td> 472 <td>CPU utilization in the user space</td> 473 <td>Percentage</td> 474 <td>Gauge</td> 475 <td>node_id, datacenter, cpu</td> 476 </tr> 477 <tr> 478 <td> 479 <code>nomad.client.host.cpu.system</code> 480 </td> 481 <td>CPU utilization in the system space</td> 482 <td>Percentage</td> 483 <td>Gauge</td> 484 <td>node_id, datacenter, cpu</td> 485 </tr> 486 <tr> 487 <td> 488 <code>nomad.client.host.cpu.idle</code> 489 </td> 490 <td>Idle time spent by the CPU</td> 491 <td>Percentage</td> 492 <td>Gauge</td> 493 <td>node_id, datacenter, cpu</td> 494 </tr> 495 <tr> 496 <td> 497 <code>nomad.client.host.disk.size</code> 498 </td> 499 <td>Total size of the device</td> 500 <td>Bytes</td> 501 <td>Gauge</td> 502 <td>node_id, datacenter, disk</td> 503 </tr> 504 <tr> 505 <td> 506 <code>nomad.client.host.disk.used</code> 507 </td> 508 <td>Amount of space which has been used</td> 509 <td>Bytes</td> 510 <td>Gauge</td> 511 <td>node_id, datacenter, disk</td> 512 </tr> 513 <tr> 514 <td> 515 <code>nomad.client.host.disk.available</code> 516 </td> 517 <td>Amount of space which is available</td> 518 <td>Bytes</td> 519 <td>Gauge</td> 520 <td>node_id, datacenter, disk</td> 521 </tr> 522 <tr> 523 <td> 524 <code>nomad.client.host.disk.used_percent</code> 525 </td> 526 <td>Percentage of disk space used</td> 527 <td>Percentage</td> 528 <td>Gauge</td> 529 <td>node_id, datacenter, disk</td> 530 </tr> 531 <tr> 532 <td> 533 <code>nomad.client.host.disk.inodes_percent</code> 534 </td> 535 <td>Disk space consumed by the inodes</td> 536 <td>Percent</td> 537 <td>Gauge</td> 538 <td>node_id, datacenter, disk</td> 539 </tr> 540 <tr> 541 <td> 542 <code>nomad.client.allocs.start</code> 543 </td> 544 <td>Number of allocations starting</td> 545 <td>Integer</td> 546 <td>Counter</td> 547 <td>node_id, job, task_group</td> 548 </tr> 549 <tr> 550 <td> 551 <code>nomad.client.allocs.running</code> 552 </td> 553 <td>Number of allocations starting to run</td> 554 <td>Integer</td> 555 <td>Counter</td> 556 <td>node_id, job, task_group</td> 557 </tr> 558 <tr> 559 <td> 560 <code>nomad.client.allocs.failed</code> 561 </td> 562 <td>Number of allocations failing</td> 563 <td>Integer</td> 564 <td>Counter</td> 565 <td>node_id, job, task_group</td> 566 </tr> 567 <tr> 568 <td> 569 <code>nomad.client.allocs.restart</code> 570 </td> 571 <td>Number of allocations restarting</td> 572 <td>Integer</td> 573 <td>Counter</td> 574 <td>node_id, job, task_group</td> 575 </tr> 576 <tr> 577 <td> 578 <code>nomad.client.allocs.complete</code> 579 </td> 580 <td>Number of allocations completing</td> 581 <td>Integer</td> 582 <td>Counter</td> 583 <td>node_id, job, task_group</td> 584 </tr> 585 <tr> 586 <td> 587 <code>nomad.client.allocs.destroy</code> 588 </td> 589 <td>Number of allocations being destroyed</td> 590 <td>Integer</td> 591 <td>Counter</td> 592 <td>node_id, job, task_group</td> 593 </tr> 594 </tbody> 595 </table> 596 597 Nomad 0.9 adds an additional `node_class` label from the client's 598 `NodeClass` attribute. This label is set to the string "none" if empty. 599 600 ## Host Metrics (deprecated post Nomad 0.7) 601 602 The below are metrics emitted by Nomad in versions prior to 0.7. These metrics 603 can be emitted in the below format post-0.7 (as well as the new format, 604 detailed above) but any new metrics will only be available in the new format. 605 606 <table> 607 <thead> 608 <tr> 609 <th>Metric</th> 610 <th>Description</th> 611 <th>Unit</th> 612 <th>Type</th> 613 </tr> 614 </thead> 615 <tbody> 616 <tr> 617 <td> 618 <code>nomad.client.allocated.cpu.<HostID></code> 619 </td> 620 <td>Total amount of CPU shares the scheduler has allocated to tasks</td> 621 <td>MHz</td> 622 <td>Gauge</td> 623 </tr> 624 <tr> 625 <td> 626 <code>nomad.client.unallocated.cpu.<HostID></code> 627 </td> 628 <td> 629 Total amount of CPU shares free for the scheduler to allocate to tasks 630 </td> 631 <td>MHz</td> 632 <td>Gauge</td> 633 </tr> 634 <tr> 635 <td> 636 <code>nomad.client.allocated.memory.<HostID></code> 637 </td> 638 <td>Total amount of memory the scheduler has allocated to tasks</td> 639 <td>Megabytes</td> 640 <td>Gauge</td> 641 </tr> 642 <tr> 643 <td> 644 <code>nomad.client.unallocated.memory.<HostID></code> 645 </td> 646 <td> 647 Total amount of memory free for the scheduler to allocate to tasks 648 </td> 649 <td>Megabytes</td> 650 <td>Gauge</td> 651 </tr> 652 <tr> 653 <td> 654 <code>nomad.client.allocated.disk.<HostID></code> 655 </td> 656 <td>Total amount of disk space the scheduler has allocated to tasks</td> 657 <td>Megabytes</td> 658 <td>Gauge</td> 659 </tr> 660 <tr> 661 <td> 662 <code>nomad.client.unallocated.disk.<HostID></code> 663 </td> 664 <td> 665 Total amount of disk space free for the scheduler to allocate to tasks 666 </td> 667 <td>Megabytes</td> 668 <td>Gauge</td> 669 </tr> 670 <tr> 671 <td> 672 <code> 673 nomad.client.allocated.network.<Device-Name>.<HostID> 674 </code> 675 </td> 676 <td> 677 Total amount of bandwidth the scheduler has allocated to tasks on the 678 given device 679 </td> 680 <td>Megabits</td> 681 <td>Gauge</td> 682 </tr> 683 <tr> 684 <td> 685 <code> 686 nomad.client.unallocated.network.<Device-Name>.<HostID> 687 </code> 688 </td> 689 <td> 690 Total amount of bandwidth free for the scheduler to allocate to tasks on 691 the given device 692 </td> 693 <td>Megabits</td> 694 <td>Gauge</td> 695 </tr> 696 <tr> 697 <td> 698 <code>nomad.client.host.memory.<HostID>.total</code> 699 </td> 700 <td>Total amount of physical memory on the node</td> 701 <td>Bytes</td> 702 <td>Gauge</td> 703 </tr> 704 <tr> 705 <td> 706 <code>nomad.client.host.memory.<HostID>.available</code> 707 </td> 708 <td> 709 Total amount of memory available to processes which includes free and 710 cached memory 711 </td> 712 <td>Bytes</td> 713 <td>Gauge</td> 714 </tr> 715 <tr> 716 <td> 717 <code>nomad.client.host.memory.<HostID>.used</code> 718 </td> 719 <td>Amount of memory used by processes</td> 720 <td>Bytes</td> 721 <td>Gauge</td> 722 </tr> 723 <tr> 724 <td> 725 <code>nomad.client.host.memory.<HostID>.free</code> 726 </td> 727 <td>Amount of memory which is free</td> 728 <td>Bytes</td> 729 <td>Gauge</td> 730 </tr> 731 <tr> 732 <td> 733 <code>nomad.client.uptime.<HostID></code> 734 </td> 735 <td>Uptime of the host running the Nomad client</td> 736 <td>Seconds</td> 737 <td>Gauge</td> 738 </tr> 739 <tr> 740 <td> 741 <code>nomad.client.host.cpu.<HostID>.<CPU-Core>.total</code> 742 </td> 743 <td>Total CPU utilization</td> 744 <td>Percentage</td> 745 <td>Gauge</td> 746 </tr> 747 <tr> 748 <td> 749 <code>nomad.client.host.cpu.<HostID>.<CPU-Core>.user</code> 750 </td> 751 <td>CPU utilization in the user space</td> 752 <td>Percentage</td> 753 <td>Gauge</td> 754 </tr> 755 <tr> 756 <td> 757 <code> 758 nomad.client.host.cpu.<HostID>.<CPU-Core>.system 759 </code> 760 </td> 761 <td>CPU utilization in the system space</td> 762 <td>Percentage</td> 763 <td>Gauge</td> 764 </tr> 765 <tr> 766 <td> 767 <code>nomad.client.host.cpu.<HostID>.<CPU-Core>.idle</code> 768 </td> 769 <td>Idle time spent by the CPU</td> 770 <td>Percentage</td> 771 <td>Gauge</td> 772 </tr> 773 <tr> 774 <td> 775 <code> 776 nomad.client.host.disk.<HostID>.<Device-Name>.size 777 </code> 778 </td> 779 <td>Total size of the device</td> 780 <td>Bytes</td> 781 <td>Gauge</td> 782 </tr> 783 <tr> 784 <td> 785 <code> 786 nomad.client.host.disk.<HostID>.<Device-Name>.used 787 </code> 788 </td> 789 <td>Amount of space which has been used</td> 790 <td>Bytes</td> 791 <td>Gauge</td> 792 </tr> 793 <tr> 794 <td> 795 <code> 796 nomad.client.host.disk.<HostID>.<Device-Name>.available 797 </code> 798 </td> 799 <td>Amount of space which is available</td> 800 <td>Bytes</td> 801 <td>Gauge</td> 802 </tr> 803 <tr> 804 <td> 805 <code> 806 nomad.client.host.disk.<HostID>.<Device-Name>.used_percent 807 </code> 808 </td> 809 <td>Percentage of disk space used</td> 810 <td>Percentage</td> 811 <td>Gauge</td> 812 </tr> 813 <tr> 814 <td> 815 <code> 816 nomad.client.host.disk.<HostID>.<Device-Name>.inodes_percent 817 </code> 818 </td> 819 <td>Disk space consumed by the inodes</td> 820 <td>Percent</td> 821 <td>Gauge</td> 822 </tr> 823 </tbody> 824 </table> 825 826 ## Allocation Metrics 827 828 <table> 829 <thead> 830 <tr> 831 <th>Metric</th> 832 <th>Description</th> 833 <th>Unit</th> 834 <th>Type</th> 835 </tr> 836 </thead> 837 <tbody> 838 <tr> 839 <td> 840 <code> 841 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.rss 842 </code> 843 </td> 844 <td>Amount of RSS memory consumed by the task</td> 845 <td>Bytes</td> 846 <td>Gauge</td> 847 </tr> 848 <tr> 849 <td> 850 <code> 851 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.cache 852 </code> 853 </td> 854 <td>Amount of memory cached by the task</td> 855 <td>Bytes</td> 856 <td>Gauge</td> 857 </tr> 858 <tr> 859 <td> 860 <code> 861 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.swap 862 </code> 863 </td> 864 <td>Amount of memory swapped by the task</td> 865 <td>Bytes</td> 866 <td>Gauge</td> 867 </tr> 868 <tr> 869 <td> 870 <code> 871 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.max_usage 872 </code> 873 </td> 874 <td>Maximum amount of memory ever used by the task</td> 875 <td>Bytes</td> 876 <td>Gauge</td> 877 </tr> 878 <tr> 879 <td> 880 <code> 881 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.kernel_usage 882 </code> 883 </td> 884 <td>Amount of memory used by the kernel for this task</td> 885 <td>Bytes</td> 886 <td>Gauge</td> 887 </tr> 888 <tr> 889 <td> 890 <code> 891 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.memory.kernel_max_usage 892 </code> 893 </td> 894 <td>Maximum amount of memory ever used by the kernel for this task</td> 895 <td>Bytes</td> 896 <td>Gauge</td> 897 </tr> 898 <tr> 899 <td> 900 <code> 901 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.total_percent 902 </code> 903 </td> 904 <td>Total CPU resources consumed by the task across all cores</td> 905 <td>Percentage</td> 906 <td>Gauge</td> 907 </tr> 908 <tr> 909 <td> 910 <code> 911 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.system 912 </code> 913 </td> 914 <td>Total CPU resources consumed by the task in the system space</td> 915 <td>Percentage</td> 916 <td>Gauge</td> 917 </tr> 918 <tr> 919 <td> 920 <code> 921 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.user 922 </code> 923 </td> 924 <td>Total CPU resources consumed by the task in the user space</td> 925 <td>Percentage</td> 926 <td>Gauge</td> 927 </tr> 928 <tr> 929 <td> 930 <code> 931 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.throttled_time 932 </code> 933 </td> 934 <td>Total time that the task was throttled</td> 935 <td>Nanoseconds</td> 936 <td>Gauge</td> 937 </tr> 938 <tr> 939 <td> 940 <code> 941 nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.total_ticks 942 </code> 943 </td> 944 <td>CPU ticks consumed by the process in the last collection interval</td> 945 <td>Integer</td> 946 <td>Gauge</td> 947 </tr> 948 </tbody> 949 </table> 950 951 ## Job Summary Metrics 952 953 Job summary metrics are emitted by the Nomad leader server. 954 955 <table> 956 <thead> 957 <tr> 958 <th>Metric</th> 959 <th>Description</th> 960 <th>Unit</th> 961 <th>Type</th> 962 <th>Labels</th> 963 </tr> 964 </thead> 965 <tbody> 966 <tr> 967 <td> 968 <code>nomad.job_summary.queued</code> 969 </td> 970 <td>Number of queued allocations for a job</td> 971 <td>Integer</td> 972 <td>Gauge</td> 973 <td>job, task_group</td> 974 </tr> 975 <tr> 976 <td> 977 <code>nomad.job_summary.complete</code> 978 </td> 979 <td>Number of complete allocations for a job</td> 980 <td>Integer</td> 981 <td>Gauge</td> 982 <td>job, task_group</td> 983 </tr> 984 <tr> 985 <td> 986 <code>nomad.job_summary.failed</code> 987 </td> 988 <td>Number of failed allocations for a job</td> 989 <td>Integer</td> 990 <td>Gauge</td> 991 <td>job, task_group</td> 992 </tr> 993 <tr> 994 <td> 995 <code>nomad.job_summary.running</code> 996 </td> 997 <td>Number of running allocations for a job</td> 998 <td>Integer</td> 999 <td>Gauge</td> 1000 <td>job, task_group</td> 1001 </tr> 1002 <tr> 1003 <td> 1004 <code>nomad.job_summary.starting</code> 1005 </td> 1006 <td>Number of starting allocations for a job</td> 1007 <td>Integer</td> 1008 <td>Gauge</td> 1009 <td>job, task_group</td> 1010 </tr> 1011 <tr> 1012 <td> 1013 <code>nomad.job_summary.lost</code> 1014 </td> 1015 <td>Number of lost allocations for a job</td> 1016 <td>Integer</td> 1017 <td>Gauge</td> 1018 <td>job, task_group</td> 1019 </tr> 1020 </tbody> 1021 </table> 1022 1023 ## Job Status Metrics 1024 1025 Job status metrics are emitted by the Nomad leader server. 1026 1027 <table> 1028 <thead> 1029 <tr> 1030 <th>Metric</th> 1031 <th>Description</th> 1032 <th>Unit</th> 1033 <th>Type</th> 1034 </tr> 1035 </thead> 1036 <tbody> 1037 <tr> 1038 <td> 1039 <code>nomad.job_status.pending</code> 1040 </td> 1041 <td>Number jobs pending</td> 1042 <td>Integer</td> 1043 <td>Gauge</td> 1044 </tr> 1045 <tr> 1046 <td> 1047 <code>nomad.job_status.running</code> 1048 </td> 1049 <td>Number jobs running</td> 1050 <td>Integer</td> 1051 <td>Gauge</td> 1052 </tr> 1053 <tr> 1054 <td> 1055 <code>nomad.job_status.dead</code> 1056 </td> 1057 <td>Number of dead jobs</td> 1058 <td>Integer</td> 1059 <td>Gauge</td> 1060 </tr> 1061 </tbody> 1062 </table> 1063 1064 ## Metric Types 1065 1066 <table> 1067 <thead> 1068 <tr> 1069 <th>Type</th> 1070 <th>Description</th> 1071 <th>Quantiles</th> 1072 </tr> 1073 </thead> 1074 <tbody> 1075 <tr> 1076 <td>Gauge</td> 1077 <td> 1078 Gauge types report an absolute number at the end of the aggregation 1079 interval 1080 </td> 1081 <td>false</td> 1082 </tr> 1083 <tr> 1084 <td>Counter</td> 1085 <td> 1086 Counts are incremented and flushed at the end of the aggregation 1087 interval and then are reset to zero 1088 </td> 1089 <td>true</td> 1090 </tr> 1091 <tr> 1092 <td>Timer</td> 1093 <td> 1094 Timers measure the time to complete a task and will include quantiles, 1095 means, standard deviation, etc per interval. 1096 </td> 1097 <td>true</td> 1098 </tr> 1099 </tbody> 1100 </table> 1101 1102 ## Tagged Metrics 1103 1104 As of version 0.7, Nomad will start emitting metrics in a tagged format. Each 1105 metric can support more than one tag, meaning that it is possible to do a 1106 match over metrics for datapoints such as a particular datacenter, and return 1107 all metrics with this tag. Nomad supports labels for namespaces as well. 1108 1109 [tagged-metrics]: /docs/telemetry/metrics#tagged-metrics