gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/gpu.md (about)

     1  # GPU Support
     2  
     3  [TOC]
     4  
     5  gVisor adds a layer of security to your AI/ML applications or other CUDA
     6  workloads while adding negligible overhead. By running these applications in a
     7  sandboxed environment, you can isolate your host system from potential
     8  vulnerabilities in AI code. This is crucial for handling sensitive data or
     9  deploying untrusted AI workloads.
    10  
    11  gVisor supports running most CUDA applications on preselected versions of
    12  [NVIDIA's open source driver](https://github.com/NVIDIA/open-gpu-kernel-modules).
    13  To achieve this, gVisor implements a proxy driver inside the sandbox, henceforth
    14  referred to as `nvproxy`. `nvproxy` proxies the application's interactions with
    15  NVIDIA's driver on the host. It provides access to NVIDIA GPU-specific devices
    16  to the sandboxed application. The CUDA application can run unmodified inside the
    17  sandbox and interact transparently with these devices.
    18  
    19  ## Environments
    20  
    21  The `runsc` flag `--nvproxy` must be specified to enable GPU support. gVisor
    22  supports GPUs in the following environments.
    23  
    24  ### NVIDIA Container Runtime
    25  
    26  The
    27  [`nvidia-container-runtime`](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime)
    28  is packaged as part of the
    29  [NVIDIA GPU Container Stack](https://github.com/NVIDIA/nvidia-container-toolkit).
    30  This runtime is just a shim and delegates all commands to the configured low
    31  level runtime (which defaults to `runc`). To use gVisor, specify `runsc` as the
    32  low level runtime in `/etc/nvidia-container-runtime/config.toml`
    33  [via the `runtimes` option](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#low-level-runtime-path)
    34  and then run CUDA containers with `nvidia-container-runtime`.
    35  
    36  NOTE: gVisor currently only supports
    37  [legacy mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#legacy-mode).
    38  The alternative,
    39  [csv mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#csv-mode),
    40  is not yet supported.
    41  
    42  ### Docker
    43  
    44  The "legacy" mode of `nvidia-container-runtime` is directly compatible with the
    45  `--gpus` flag implemented by the docker CLI. So with Docker, `runsc` can be used
    46  directly (without having to go through `nvidia-container-runtime`).
    47  
    48  ```
    49  $ docker run --runtime=runsc --gpus=all --rm -it nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8
    50  [Vector addition of 50000 elements]
    51  Copy input data from the host memory to the CUDA device
    52  CUDA kernel launch with 196 blocks of 256 threads
    53  Copy output data from the CUDA device to the host memory
    54  Test PASSED
    55  Done
    56  ```
    57  
    58  ### GKE Device Plugin
    59  
    60  [GKE](https://cloud.google.com/kubernetes-engine) uses a different GPU container
    61  stack than NVIDIA's. GKE has
    62  [its own device plugin](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
    63  (which is different from
    64  [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin)). GKE's
    65  plugin modifies the container spec in a different way than the above-mentioned
    66  methods.
    67  
    68  NOTE: `nvproxy` does not have integration support for `k8s-device-plugin` yet.
    69  So k8s environments other than GKE might not be supported.
    70  
    71  ## Compatibility
    72  
    73  gVisor supports a wide range of CUDA workloads, including PyTorch and various
    74  generative models like LLMs. Check out
    75  [this blog post about running Stable Diffusion with gVisor](/blog/2023/06/20/gpu-pytorch-stable-diffusion/).
    76  gVisor undergoes continuous tests to ensure this functionality remains robust.
    77  [Real-world usage](https://github.com/google/gvisor/issues?q=is%3Aissue+label%3A%22area%3A+gpu%22+)
    78  of gVisor across different CUDA workloads helps discover and address potential
    79  compatibility or performance issues in `nvproxy`.
    80  
    81  `nvproxy` is a passthrough driver that forwards `ioctl(2)` calls made to NVIDIA
    82  devices by the containerized application directly to the host NVIDIA driver.
    83  This forwarding is straightforward: `ioctl` parameters are copied from the
    84  application's address space to the sentry's address space, and then a host
    85  `ioctl` syscall is made. `ioctl`s are passed through with minimal intervention;
    86  `nvproxy` does not emulate NVIDIA kernel-mode driver (KMD) logic. This design
    87  translates to minimal overhead for GPU operations, ensuring that GPU bound
    88  workloads experience negligible performance impact.
    89  
    90  However, the presence of pointers and file descriptors within some `ioctl`
    91  structs forces `nvproxy` to perform appropriate translations. This requires
    92  `nvproxy` to be aware of the KMD's ABI, specifically the layout of `ioctl`
    93  structs. The challenge is compounded by the lack of ABI stability guarantees in
    94  NVIDIA's KMD, meaning `ioctl` definitions can change arbitrarily between
    95  releases. While the NVIDIA installer ensures matching KMD and user-mode driver
    96  (UMD) component versions, a single gVisor version might be used with multiple
    97  NVIDIA drivers. As a result, `nvproxy` must understand the ABI for each
    98  supported driver version, necessitating internal versioning logic for `ioctl`s.
    99  
   100  As a result, `nvproxy` has the following limitations:
   101  
   102  1.  Supports selected GPU models.
   103  2.  Supports selected NVIDIA driver versions.
   104  3.  Supports selected NVIDIA device files.
   105  4.  Supports selected `ioctl`s on each device file.
   106  
   107  ### Supported GPUs {#gpu-models}
   108  
   109  gVisor currently supports NVIDIA GPUs: T4, L4, A100, A10G and H100. Please
   110  [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+enhancement,area%3A+gpu&template=bug_report.yml)
   111  if you want support for another GPU model.
   112  
   113  ### Rolling Version Support Window {#driver-versions}
   114  
   115  The range of driver versions supported by `nvproxy` directly aligns with those
   116  available within GKE. As GKE incorporates newer drivers, `nvproxy` will extend
   117  support accordingly. Conversely, to manage versioning complexity, `nvproxy` will
   118  drop support for drivers removed from GKE. This strategy ensures a streamlined
   119  process and avoids unbounded growth in `nvproxy`'s versioning.
   120  
   121  To see what drivers a given `runsc` version supports, run:
   122  
   123  ```
   124  $ runsc nvproxy list-supported-drivers
   125  ```
   126  
   127  ### Supported Device Files {#device-files}
   128  
   129  gVisor only exposes `/dev/nvidiactl`, `/dev/nvidia-uvm` and `/dev/nvidia#`.
   130  
   131  Some unsupported NVIDIA device files are:
   132  
   133  -   `/dev/nvidia-caps/*`: Controls `nvidia-capabilities`, which is mainly used
   134      by Multi-instance GPUs (MIGs).
   135  -   `/dev/nvidia-drm`: Plugs into Linux's Direct Rendering Manager (DRM)
   136      subsystem.
   137  -   `/dev/nvidia-modeset`: Enables `DRIVER_MODESET` capability in `nvidia-drm`
   138      devices.
   139  
   140  ### Supported `ioctl` Set {#ioctls}
   141  
   142  To minimize maintenance overhead across supported driver versions, the set of
   143  supported NVIDIA device `ioctl`s is intentionally limited. This set was
   144  generated by running a large number of CUDA workloads in gVisor. As `nvproxy` is
   145  adapted to more use cases, this set will continue to evolve.
   146  
   147  Currently, `nvproxy` focuses on supporting compute workloads (like CUDA).
   148  Graphics and video capabilities are not yet supported due to missing `ioctl`s.
   149  If your GPU compute workload fails with gVisor, please note that some `ioctl`
   150  commands might still be unimplemented. Please
   151  [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+bug,area%3A+gpu&template=bug_report.yml)
   152  to describe about your use case. If a missing `ioctl` implementation is the
   153  problem, then the [debug logs](/docs/user_guide/debugging/) will contain
   154  warnings with prefix `nvproxy: unknown *`.
   155  
   156  ## Security
   157  
   158  While CUDA support enables important use cases for gVisor, it is important for
   159  users to understand the security model around the use of GPUs in sandboxes. In
   160  short, while gVisor will protect the host from the sandboxed application,
   161  **NVIDIA driver updates must be part of any security plan with or without
   162  gVisor**.
   163  
   164  First, a short discussion on
   165  [gvisor's security model](../architecture_guide/security.md). gVisor protects
   166  the host from sandboxed applications by providing several layers of defense. The
   167  layers most relevant to this discussion are the redirection of application
   168  syscalls to the gVisor sandbox and use of
   169  [seccomp-bpf](https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html)
   170  on gVisor sandboxes.
   171  
   172  gVisor uses a "platform" to tell the host kernel to reroute system calls to the
   173  sandbox process, known as the sentry. The sentry implements a syscall table,
   174  which services all application syscalls. The Sentry *may* make syscalls to the
   175  host kernel if it needs them to fulfill the application syscall, but it doesn't
   176  merely pass an application syscall to the host kernel.
   177  
   178  On sandbox boot, seccomp filters are applied to the sandbox. Seccomp filters
   179  applied to the sandbox constrain the set of syscalls that it can make to the
   180  host kernel, blocking access to most host kernel vulnerabilities even if the
   181  sandbox becomes compromised.
   182  
   183  For example, [CVE-2022-0185](https://nvd.nist.gov/vuln/detail/CVE-2022-0185) is
   184  mitigated because gVisor itself handles the syscalls required to use namespaces
   185  and capabilities, so the application is using gVisor's implementation, not the
   186  host kernel's. For a compromised sandbox, the syscalls required to exploit the
   187  vulnerability are blocked by seccomp filters.
   188  
   189  In addition, seccomp-bpf filters can filter by argument names allowing us to
   190  allowlist granularly by `ioctl(2)` arguments. `ioctl(2)` is a source of many
   191  bugs in any kernel due to the complexity of its implementation. As of writing,
   192  gVisor does
   193  [allowlist some `ioctl`s](https://github.com/google/gvisor/blob/ccc3c2cbd26d3514885bd665b0a110150a6e8c53/runsc/boot/filter/config/config_main.go#L111)
   194  by argument for things like terminal support.
   195  
   196  For example, [CVE-2024-21626](https://nvd.nist.gov/vuln/detail/CVE-2024-21626)
   197  is mitigated by gVisor because the application would use gVisor's implementation
   198  of `ioctl(2)`. For a compromised sentry, `ioctl(2)` calls with the needed
   199  arguments are not in the seccomp filter allowlist, blocking the attacker from
   200  making the call. gVisor also mitigates similar vulnerabilities that come with
   201  device drivers
   202  ([CVE-2023-33107](https://nvd.nist.gov/vuln/detail/CVE-2023-33107)).
   203  
   204  ### nvproxy Security
   205  
   206  Recall that "nvproxy" allows applications to directly interact with supported
   207  ioctls defined in the NVIDIA driver.
   208  
   209  gVisor's seccomp filter rules are modified such that `ioctl(2)` calls can be
   210  made
   211  [*only for supported ioctls*](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/seccomp_filters.go#L1).
   212  The allowlisted rules aligned with each
   213  [driver version](https://github.com/google/gvisor/blob/c087777e37a186e38206209c41178e92ef1bbe81/pkg/sentry/devices/nvproxy/version.go#L152).
   214  This approach is similar to the allowlisted ioctls for terminal support
   215  described above. This allows gVisor to retain the vast majority of its
   216  protection for the host while allowing access to GPUs. All of the above CVEs
   217  remain mitigated even when "nvproxy" is used.
   218  
   219  However, gVisor is much less effective at mitigating vulnerabilities within the
   220  NVIDIA GPU drivers themselves, *because* gVisor passes through calls to be
   221  handled by the kernel module. If there is a vulnerability in a given driver for
   222  a given GPU `ioctl` (read feature) that gVisor passes through, then gVisor will
   223  also be vulnerable. If the vulnerability is in an unimplemented feature, gVisor
   224  will block the required calls with seccomp filters.
   225  
   226  In addition, gVisor doesn't introduce any additional hardware-level isolation
   227  beyond that which is configured by by the NVIDIA kernel-mode driver. There is no
   228  validation of things like DMA buffers. The only checks are done in seccomp-bpf
   229  rules to ensure `ioctl(2)` calls are made on supported and allowlisted `ioctl`s.
   230  
   231  Therefore, **it is imperative that users update NVIDIA drivers in a timely
   232  manner with or without gVisor**. To see the latest drivers gVisor supports, you
   233  can run the following with your runsc release:
   234  
   235  ```
   236  $ runsc nvproxy list-supported-drivers
   237  ```
   238  
   239  Alternatively you can view the
   240  [source code](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/version.go#L1)
   241  or download it and run:
   242  
   243  ```
   244  $ make run TARGETS=runsc:runsc ARGS="nvproxy list-supported-drivers"
   245  ```
   246  
   247  ### So, if you don't protect against all the things, why even?
   248  
   249  While gVisor doesn't protect against *all* NVIDIA driver vulnerabilities, it
   250  *does* protect against a large set of general vulnerabilities in Linux.
   251  Applications don't just use GPUs, they use them as a part of a larger
   252  application that may include third party libraries. For example, Tensorflow
   253  [suffers from the same kind of vulnerabilities](https://nvd.nist.gov/vuln/detail/CVE-2022-29216)
   254  that every application does. Designing and implementing an application with
   255  security in mind is hard and in the emerging AI space, security is often
   256  overlooked in favor of getting to market fast. There are also many services that
   257  allow users to run external users' code on the vendor's infrastructure. gVisor
   258  is well suited as part of a larger security plan for these and other use cases.