gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/observability.md (about)

     1  # Observability
     2  
     3  [TOC]
     4  
     5  This guide describes how to obtain Prometheus monitoring data from gVisor
     6  sandboxes running with `runsc`.
     7  
     8  **NOTE**: These metrics are mostly information about gVisor internals, and do
     9  not provide introspection capabilities into the workload being sandboxed. If you
    10  would like to monitor the sandboxed workload (e.g. for threat detection), refer
    11  to **[Runtime Monitoring](runtime_monitoring.md)**.
    12  
    13  `runsc` implements a
    14  [Prometheus-compliant](https://prometheus.io/docs/instrumenting/exposition_formats/)
    15  HTTP metric server using the `runsc metric-server` subcommand. This server is
    16  meant to run **unsandboxed** as a sidecar process of your container runtime
    17  (e.g. Docker).
    18  
    19  ## One-off metric export
    20  
    21  You can export metric information from running sandboxes using the `runsc
    22  export-metrics` subcommand. This does not require special configuration or
    23  setting up a Prometheus server.
    24  
    25  ```
    26  $ docker run -d --runtime=runsc --name=foobar debian sleep 1h
    27  c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
    28  $ sudo /path/to/runsc --root=/var/run/docker/runtime-runc/moby export-metrics c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
    29  # Command-line export for sandbox c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
    30  # Writing data from snapshot containing 175 data points taken at 2023-01-25 15:46:50.469403696 -0800 PST.
    31  
    32  
    33  # HELP runsc_fs_opens Number of file opens.
    34  # TYPE runsc_fs_opens counter
    35  runsc_fs_opens{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 62 1674690410469
    36  
    37  # HELP runsc_fs_read_wait Time waiting on file reads, in nanoseconds.
    38  # TYPE runsc_fs_read_wait counter
    39  runsc_fs_read_wait{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 0 1674690410469
    40  
    41  # HELP runsc_fs_reads Number of file reads.
    42  # TYPE runsc_fs_reads counter
    43  runsc_fs_reads{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 54 1674690410469
    44  
    45  # [...]
    46  ```
    47  
    48  ## Starting the metric server
    49  
    50  Use the `runsc metric-server` subcommand:
    51  
    52  ```shell
    53  $ sudo runsc \
    54      --root=/var/run/docker/runtime-runc/moby \
    55      --metric-server=localhost:1337 \
    56      metric-server
    57  ```
    58  
    59  `--root` needs to be set to the OCI runtime root directory that your runtime
    60  implementation uses. For Docker, this is typically
    61  `/var/run/docker/runtime-runc/moby`; otherwise, if you already have gVisor set
    62  up, you can use `ps aux | grep runsc` on the host to find the `--root` that a
    63  running sandbox is using. This directory is typically only accessible by the
    64  user Docker runs as (usually `root`), hence `sudo`. The metric server uses the
    65  `--root` directory to scan for sandboxes running on the system.
    66  
    67  The `--metric-server` flag is the network address or UDS path to bind to. In
    68  this example, this will create a server bound on all interfaces on TCP port
    69  `1337`. To listen on `lo` only, you could alternatively use
    70  `--metric-server=localhost:1337`.
    71  
    72  If something goes wrong, you may also want to add `--debug
    73  --debug-log=/dev/stderr` to understand the metric server's behavior.
    74  
    75  You can query the metric server with `curl`:
    76  
    77  ```
    78  $ curl http://localhost:1337/metrics
    79  # Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
    80  # [...]
    81  
    82  # HELP process_start_time_seconds Unix timestamp at which the process started. Used by Prometheus for counter resets.
    83  # TYPE process_start_time_seconds gauge
    84  process_start_time_seconds 1674598082.698509 1674598109532
    85  
    86  # End of metric data.
    87  ```
    88  
    89  ## Starting sandboxes with metrics enabled
    90  
    91  Sandbox metrics are disabled by default. To enable, add the flag
    92  `--metric-server={ADDRESS}:{PORT}` to the runtime configuration. With Docker,
    93  this can be set in `/etc/docker/daemon.json` like so:
    94  
    95  ```json
    96  {
    97      "runtimes": {
    98          "runsc": {
    99              "path": "/path/to/runsc",
   100              "runtimeArgs": [
   101                  "--metric-server=localhost:1337"
   102              ]
   103          }
   104      }
   105  }
   106  ```
   107  
   108  **NOTE**: The `--metric-server` flag value must be an exact string match between
   109  the runtime configuration and the `runsc metric-server` command.
   110  
   111  Once you've done this, you can start a container and see that it shows up in the
   112  list of Prometheus metrics.
   113  
   114  ```
   115  $ docker run -d --runtime=runsc --name=foobar debian sleep 1h
   116  32beefcafe
   117  
   118  $ curl http://localhost:1337/metrics
   119  # Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
   120  # Writing data from 3 snapshots: [...]
   121  
   122  
   123  # HELP process_start_time_seconds Unix timestamp at which the process started. Used by Prometheus for counter resets.
   124  # TYPE process_start_time_seconds gauge
   125  process_start_time_seconds 1674599158.286067 1674599159819
   126  
   127  # HELP runsc_fs_opens Number of file opens.
   128  # TYPE runsc_fs_opens counter
   129  runsc_fs_opens{iteration="42asdf",sandbox="32beefcafe"} 12 1674599159819
   130  
   131  # HELP runsc_fs_read_wait Time waiting on file reads, in nanoseconds.
   132  # TYPE runsc_fs_read_wait counter
   133  runsc_fs_read_wait{iteration="42asdf",sandbox="32beefcafe"} 0 1674599159819
   134  
   135  # [...]
   136  
   137  # End of metric data.
   138  ```
   139  
   140  Each per-container metric is labeled with at least:
   141  
   142  -   `sandbox`: The container ID, in this case `32beefcafe`
   143  -   `iteration`: A randomly-generated string (in this case `42asdf`) that stays
   144      constant for the lifetime of the sandbox. This helps distinguish between
   145      successive instances of the same sandbox with the same ID.
   146  
   147  If you'd like to run some containers with metrics turned off and some on within
   148  the same system, use two runtime entries in `/etc/docker/daemon.json` with only
   149  one of them having the `--metric-server` flag set.
   150  
   151  ## Exporting data to Prometheus
   152  
   153  The metric server exposes a
   154  [standard `/metrics` HTTP endpoint](https://prometheus.io/docs/instrumenting/exposition_formats/)
   155  on the address given by the `--metric-server` flag passed to `runsc
   156  metric-server`. Simply point Prometheus at this address.
   157  
   158  If desired, you can change the
   159  [exporter name](https://prometheus.io/docs/instrumenting/writing_exporters/)
   160  (prefix applied to all metric names) using the `--exporter-prefix` flag. It
   161  defaults to `runsc_`.
   162  
   163  The sandbox metrics exported may be filtered by using the optional `GET`
   164  parameter `runsc-sandbox-metrics-filter`, e.g.
   165  `/metrics?runsc-sandbox-metrics-filter=fs_.*`. Metric names must fully match
   166  this regular expression. Note that this filtering is performed before prepending
   167  `--exporter-prefix` to metric names.
   168  
   169  The metric server also supports listening on a
   170  [Unix Domain Socket](https://en.wikipedia.org/wiki/Unix_domain_socket). This can
   171  be convenient to avoid reserving port numbers on the machine's network
   172  interface, or for tighter control over who can read the data. Clients should
   173  talk HTTP over this UDS. While Prometheus doesn't natively support reading
   174  metrics from a UDS, this feature can be used in conjunction with a tool such as
   175  [`socat` to re-expose this as a regular TCP port](https://serverfault.com/questions/517906/how-to-expose-a-unix-domain-socket-directly-over-tcp)
   176  within another context (e.g. a tightly-managed network namespace that Prometheus
   177  runs in).
   178  
   179  ```
   180  $ sudo runsc --root=/var/run/docker/runtime-runc/moby --metric-server=/run/docker/runsc-metrics.sock metric-server &
   181  
   182  $ sudo curl --unix-socket /run/docker/runsc-metrics.sock http://runsc-metrics/metrics
   183  # Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
   184  # [...]
   185  # End of metric data.
   186  
   187  # Set up socat to forward requests from *:1337 to /run/docker/runsc-metrics.sock in its own network namespace:
   188  $ sudo unshare --net socat TCP-LISTEN:1337,reuseaddr,fork UNIX-CONNECT:/run/docker/runsc-metrics.sock &
   189  
   190  # Set up basic networking for socat's network namespace:
   191  $ sudo nsenter --net="/proc/$(pidof socat)/ns/net" sh -c 'ip link set lo up && ip route add default dev lo'
   192  
   193  # Grab metric data from this namespace:
   194  $ sudo nsenter --net="/proc/$(pidof socat)/ns/net" curl http://localhost:1337/metrics
   195  # Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
   196  # [...]
   197  # End of metric data.
   198  ```
   199  
   200  ## Running the metric server in a sandbox
   201  
   202  If you would like to run the metric server in a gVisor sandbox, you may do so,
   203  provided that you give it access to the OCI runtime root directory, forward the
   204  network port it binds to for external access, and enable host UDS support.
   205  
   206  **WARNING**: Doing this does not provide you the full security of gVisor, as it
   207  still grants the metric server full control over all running gVisor sandboxes on
   208  the system. This step is only a defense-in-depth measure.
   209  
   210  To do this, add a runtime with the `--host-uds=all` flag to
   211  `/etc/docker/daemon.json`. The metric server needs the ability to open existing
   212  UDSs (in order to communicate with running sandboxes), and to create new UDSs
   213  (in order to create and listen on `/run/docker/runsc-metrics.sock`).
   214  
   215  ```json
   216  {
   217      "runtimes": {
   218          "runsc": {
   219              "path": "/path/to/runsc",
   220              "runtimeArgs": [
   221                  "--metric-server=/run/docker/runsc-metrics.sock"
   222              ]
   223          },
   224          "runsc-metric-server": {
   225              "path": "/path/to/runsc",
   226              "runtimeArgs": [
   227                  "--metric-server=/run/docker/runsc-metrics.sock",
   228                  "--host-uds=all"
   229              ]
   230          }
   231      }
   232  }
   233  ```
   234  
   235  Then start the metric server with this runtime, passing through the directories
   236  containing the control files `runsc` uses to detect and communicate with running
   237  sandboxes:
   238  
   239  ```shell
   240  $ docker run -d --runtime=runsc-metric-server --name=runsc-metric-server \
   241      --volume="$(which runsc):/runsc:ro"  \
   242      --volume=/var/run/docker/runtime-runc/moby:/var/run/docker/runtime-runc/moby \
   243      --volume=/run/docker:/run/docker \
   244      --volume=/var/run:/var/run \
   245      alpine \
   246          /runsc \
   247              --root=/var/run/docker/runtime-runc/moby \
   248              --metric-server=/run/docker/runsc-metrics.sock \
   249              --debug --debug-log=/dev/stderr \
   250              metric-server
   251  ```
   252  
   253  Yes, this means the metric server will report data about its own sandbox:
   254  
   255  ```
   256  $ metric_server_id="$(docker inspect --format='{{.ID}}' runsc-metric-server)"
   257  $ sudo curl --unix-socket /run/docker/runsc-metrics.sock http://runsc-metrics/metrics | grep "$metric_server_id"
   258  #   - Snapshot with 175 data points taken at 2023-01-25 15:45:33.70256855 -0800 -0800: map[iteration:2407456650315156914 sandbox:737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481]
   259  runsc_fs_opens{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 54 1674690333702
   260  runsc_fs_read_wait{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 0 1674690333702
   261  runsc_fs_reads{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 52 1674690333702
   262  # [...]
   263  ```
   264  
   265  ## Labeling pods on Kubernetes
   266  
   267  When using Kubernetes, users typically deal with pod names and container names.
   268  On Kubelet machines, the underlying container names passed to the runtime are
   269  non-human-friendly hexadecimal strings.
   270  
   271  In order to provide more user-friendly labels, the metric server will pick up
   272  the `io.kubernetes.cri.sandbox-name` and `io.kubernetes.cri.sandbox-namespace`
   273  annotations provided by `containerd`, and automatically add these as labels
   274  (`pod_name` and `namespace_name` respectively) for each per-sandbox metric.
   275  
   276  ## Metrics exported
   277  
   278  The metric server exports a lot of gVisor-internal metrics, and generates its
   279  own metrics as well. All metrics have documentation and type annotations in the
   280  `/metrics` output, and this section aims to document some useful ones.
   281  
   282  ### Process-wide metrics
   283  
   284  *   `process_start_time_seconds`: Unix timestamp representing the time at which
   285      the metric server started. This specific metric name is used by Prometheus,
   286      and as such its name is not affected by the `--exporter-prefix` flag. This
   287      metric is process-wide and has no labels.
   288  *   `num_sandboxes_total`: A process-wide metric representing the total number
   289      of sandboxes that the metric server knows about.
   290  *   `num_sandboxes_running`: A process-wide metric representing the number of
   291      running sandboxes that the metric server knows about.
   292  *   `num_sandboxes_broken_metrics`: A process-wide metric representing the
   293      number of sandboxes from which the metric server could not get metric data.
   294  
   295  ### Per-sandbox metrics
   296  
   297  *   `sandbox_presence`: A per-sandbox metric that is set to `1` for each sandbox
   298      that the metric server knows about. This can be used to join with other
   299      per-sandbox or per-pod metrics for which metric existence is not guaranteed.
   300  *   `sandbox_running`: A per-sandbox metric that is set to `1` for each sandbox
   301      that the metric server knows about and that is actively running. This can be
   302      used in conjunction with `sandbox_presence` to determine the set of
   303      sandboxes that aren't running; useful if you want to alert about sandboxes
   304      that are down.
   305  *   `sandbox_metadata`: A per-sandbox metric that carries a superset of the
   306      typical per-sandbox labels found on other per-sandbox metrics. These extra
   307      labels contain useful metadata about the sandbox, such as the version
   308      number, [platform](platforms.md), and [network type](networking.md) being
   309      used.
   310  *   `sandbox_capabilities`: A per-sandbox, per-capability metric that carries
   311      the union of all capabilities present on at least one container of the
   312      sandbox. Can optionally be filtered to only a subset of capabilities using
   313      the `runsc-capability-filter` GET parameter on `/metrics` requests (regular
   314      expression). Useful for auditing and aggregating the capabilities you rely
   315      on across multiple sandboxes.
   316  *   `sandbox_creation_time_seconds`: A per-sandbox Unix timestamp representing
   317      the time at which this sandbox was created.