github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/tools/autoscaling/telemetry.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Telemetry
     4  description: >
     5    Overview of runtime metrics available in the Nomad Autoscaler.
     6  ---
     7  
     8  # Nomad Autoscaler Telemetry
     9  
    10  The Nomad Autoscaler agent collects various runtime metrics about the performance
    11  of different libraries and subsystems. These metrics are aggregated on a ten
    12  second interval and are retained for one minute. To configure the telemetry output
    13  please see the [agent configuration][agent_telemetry_config].
    14  
    15  This data can be accessed via the `/v1/metrics` HTTP endpoint, via sending a
    16  signal to the Nomad Autoscaler process or via a number of integrations.
    17  
    18  To view this data via sending a signal to the Nomad Autoscaler process: on Unix,
    19  this is `USR1` while on Windows it is `BREAK`. Once Nomad Autoscaler receives
    20  the signal, it will dump the current telemetry information to the agent's `stderr`.
    21  
    22  This telemetry information can be used for debugging or otherwise
    23  getting a better view of what Nomad is doing.
    24  
    25  Below is sample output of a telemetry dump:
    26  
    27  ```text
    28  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
    29  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
    30  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
    31  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
    32  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
    33  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
    34  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
    35  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
    36  [2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
    37  [2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
    38  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
    39  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
    40  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
    41  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
    42  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
    43  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
    44  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
    45  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
    46  [2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000
    47  ```
    48  
    49  ## Runtime Metrics
    50  
    51  The runtime metrics help understand the Nomad Autoscaler agent's memory and load
    52  pressure performance.
    53  
    54  <table>
    55    <thead>
    56      <tr>
    57        <th>Metric</th>
    58        <th>Description</th>
    59        <th>Type</th>
    60      </tr>
    61    </thead>
    62    <tbody>
    63      <tr>
    64        <td>
    65          <code>nomad-autoscaler.runtime.num_goroutines</code>
    66        </td>
    67        <td>Number of running goroutines</td>
    68        <td>Gauge</td>
    69      </tr>
    70      <tr>
    71        <td>
    72          <code>nomad-autoscaler.runtime.alloc_bytes</code>
    73        </td>
    74        <td>The number of allocated heap bytes</td>
    75        <td>Gauge</td>
    76      </tr>
    77      <tr>
    78        <td>
    79          <code>nomad-autoscaler.runtime.sys_bytes</code>
    80        </td>
    81        <td>The total bytes of memory obtained from the OS</td>
    82        <td>Gauge</td>
    83      </tr>
    84      <tr>
    85        <td>
    86          <code>nomad-autoscaler.runtime.malloc_count</code>
    87        </td>
    88        <td>Cumulative count of heap objects allocated</td>
    89        <td>Gauge</td>
    90      </tr>
    91      <tr>
    92        <td>
    93          <code>nomad-autoscaler.runtime.free_count</code>
    94        </td>
    95        <td>Cumulative count of heap objects freed</td>
    96        <td>Gauge</td>
    97      </tr>
    98      <tr>
    99        <td>
   100          <code>nomad-autoscaler.runtime.heap_objects</code>
   101        </td>
   102        <td>Number of allocated heap objects</td>
   103        <td>Gauge</td>
   104      </tr>
   105      <tr>
   106        <td>
   107          <code>nomad-autoscaler.runtime.total_gc_pause_ns</code>
   108        </td>
   109        <td>Cumulative nanoseconds in GC stop-the-world pauses</td>
   110        <td>Gauge</td>
   111      </tr>
   112      <tr>
   113        <td>
   114          <code>nomad-autoscaler.runtime.total_gc_runs</code>
   115        </td>
   116        <td>Number of completed GC cycles</td>
   117        <td>Gauge</td>
   118      </tr>
   119      <tr>
   120        <td>
   121          <code>nomad-autoscaler.runtime.gc_pause_ns</code>
   122        </td>
   123        <td>Number of nanoseconds to complete the last GC cycle</td>
   124        <td>Timer</td>
   125      </tr>
   126    </tbody>
   127  </table>
   128  
   129  ## Policy Metrics
   130  
   131  Policy metrics provide insights into the performance of the Nomad Autoscaler's
   132  policy handling.
   133  
   134  <table>
   135    <thead>
   136      <tr>
   137        <th>Metric</th>
   138        <th>Description</th>
   139        <th>Type</th>
   140        <th>Labels</th>
   141      </tr>
   142    </thead>
   143    <tbody>
   144      <tr>
   145        <td>
   146          <code>nomad-autoscaler.policy.total_num</code>
   147        </td>
   148        <td>The number of policies currently held within the autoscaler</td>
   149        <td>Gauge</td>
   150        <td></td>
   151      </tr>
   152      <tr>
   153        <td>
   154          <code>nomad-autoscaler.policy.source.error_count</code>
   155        </td>
   156        <td>Tracks the number of errors generated by the policy sources</td>
   157        <td>Counter</td>
   158        <td>policy_source</td>
   159      </tr>
   160    </tbody>
   161  </table>
   162  
   163  ## Scaling Metrics
   164  
   165  Scaling metrics provide insight into the performance of scaling actions as well
   166  as overall success and failure counters.
   167  
   168  <table>
   169    <thead>
   170      <tr>
   171        <th>Metric</th>
   172        <th>Description</th>
   173        <th>Type</th>
   174        <th>Labels</th>
   175      </tr>
   176    </thead>
   177    <tbody>
   178      <tr>
   179        <td>
   180          <code>nomad-autoscaler.scale.evaluate_ms</code>
   181        </td>
   182        <td>The time taken to evaluate the checks within a single policy</td>
   183        <td>Timer</td>
   184        <td>policy_id, target_name</td>
   185      </tr>
   186      <tr>
   187        <td>
   188          <code>nomad-autoscaler.scale.invoke_ms</code>
   189        </td>
   190        <td>The time taken to invoke scaling based on the scaling evaluations</td>
   191        <td>Timer</td>
   192        <td>policy_id, target_name</td>
   193      </tr>
   194      <tr>
   195        <td>
   196          <code>nomad-autoscaler.scale.invoke.success_count</code>
   197        </td>
   198        <td>Tracks the number of successful scaling actions triggered</td>
   199        <td>Counter</td>
   200        <td></td>
   201      </tr>
   202      <tr>
   203        <td>
   204          <code>nomad-autoscaler.scale.invoke.error_count</code>
   205        </td>
   206        <td>Tracks the number of unsuccessful scaling actions triggered</td>
   207        <td>Counter</td>
   208        <td></td>
   209      </tr>
   210    </tbody>
   211  </table>
   212  
   213  ## Plugin Metrics
   214  
   215  Plugin metrics provide insight into the performance of Nomad Autoscaler plugins
   216  and help identify potential bottle necks or latency issues.
   217  
   218  <table>
   219    <thead>
   220      <tr>
   221        <th>Metric</th>
   222        <th>Description</th>
   223        <th>Type</th>
   224        <th>Labels</th>
   225      </tr>
   226    </thead>
   227    <tbody>
   228      <tr>
   229        <td>
   230          <code>nomad-autoscaler.plugin.manager.access_ms</code>
   231        </td>
   232        <td>The time taken to dispense a plugin</td>
   233        <td>Timer</td>
   234        <td></td>
   235      </tr>
   236      <tr>
   237        <td>
   238          <code>nomad-autoscaler.target.status.invoke_ms</code>
   239        </td>
   240        <td>The time taken to perform the target plugin status call</td>
   241        <td>Timer</td>
   242        <td>policy_id, plugin_name</td>
   243      </tr>
   244      <tr>
   245        <td>
   246          <code>nomad-autoscaler.target.scale.invoke_ms</code>
   247        </td>
   248        <td>The time taken to perform the target plugin scale call</td>
   249        <td>Timer</td>
   250        <td>policy_id, plugin_name</td>
   251      </tr>
   252      <tr>
   253        <td>
   254          <code>nomad-autoscaler.apm.query.invoke_ms</code>
   255        </td>
   256        <td>The time taken to perform the APM plugin query call</td>
   257        <td>Timer</td>
   258        <td>policy_id, plugin_name</td>
   259      </tr>
   260      <tr>
   261        <td>
   262          <code>nomad-autoscaler.strategy.run.invoke_ms</code>
   263        </td>
   264        <td>The time taken to perform the strategy plugin run call</td>
   265        <td>Timer</td>
   266        <td>policy_id, plugin_name</td>
   267      </tr>
   268    </tbody>
   269  </table>
   270  
   271  [agent_telemetry_config]: /tools/autoscaling/agent#telemetry-block