gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/gpu.md (about) 1 # GPU Support 2 3 [TOC] 4 5 gVisor adds a layer of security to your AI/ML applications or other CUDA 6 workloads while adding negligible overhead. By running these applications in a 7 sandboxed environment, you can isolate your host system from potential 8 vulnerabilities in AI code. This is crucial for handling sensitive data or 9 deploying untrusted AI workloads. 10 11 gVisor supports running most CUDA applications on preselected versions of 12 [NVIDIA's open source driver](https://github.com/NVIDIA/open-gpu-kernel-modules). 13 To achieve this, gVisor implements a proxy driver inside the sandbox, henceforth 14 referred to as `nvproxy`. `nvproxy` proxies the application's interactions with 15 NVIDIA's driver on the host. It provides access to NVIDIA GPU-specific devices 16 to the sandboxed application. The CUDA application can run unmodified inside the 17 sandbox and interact transparently with these devices. 18 19 ## Environments 20 21 The `runsc` flag `--nvproxy` must be specified to enable GPU support. gVisor 22 supports GPUs in the following environments. 23 24 ### NVIDIA Container Runtime 25 26 The 27 [`nvidia-container-runtime`](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime) 28 is packaged as part of the 29 [NVIDIA GPU Container Stack](https://github.com/NVIDIA/nvidia-container-toolkit). 30 This runtime is just a shim and delegates all commands to the configured low 31 level runtime (which defaults to `runc`). To use gVisor, specify `runsc` as the 32 low level runtime in `/etc/nvidia-container-runtime/config.toml` 33 [via the `runtimes` option](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#low-level-runtime-path) 34 and then run CUDA containers with `nvidia-container-runtime`. 35 36 NOTE: gVisor currently only supports 37 [legacy mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#legacy-mode). 38 The alternative, 39 [csv mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#csv-mode), 40 is not yet supported. 41 42 ### Docker 43 44 The "legacy" mode of `nvidia-container-runtime` is directly compatible with the 45 `--gpus` flag implemented by the docker CLI. So with Docker, `runsc` can be used 46 directly (without having to go through `nvidia-container-runtime`). 47 48 ``` 49 $ docker run --runtime=runsc --gpus=all --rm -it nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 50 [Vector addition of 50000 elements] 51 Copy input data from the host memory to the CUDA device 52 CUDA kernel launch with 196 blocks of 256 threads 53 Copy output data from the CUDA device to the host memory 54 Test PASSED 55 Done 56 ``` 57 58 ### GKE Device Plugin 59 60 [GKE](https://cloud.google.com/kubernetes-engine) uses a different GPU container 61 stack than NVIDIA's. GKE has 62 [its own device plugin](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu) 63 (which is different from 64 [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin)). GKE's 65 plugin modifies the container spec in a different way than the above-mentioned 66 methods. 67 68 NOTE: `nvproxy` does not have integration support for `k8s-device-plugin` yet. 69 So k8s environments other than GKE might not be supported. 70 71 ## Compatibility 72 73 gVisor supports a wide range of CUDA workloads, including PyTorch and various 74 generative models like LLMs. Check out 75 [this blog post about running Stable Diffusion with gVisor](/blog/2023/06/20/gpu-pytorch-stable-diffusion/). 76 gVisor undergoes continuous tests to ensure this functionality remains robust. 77 [Real-world usage](https://github.com/google/gvisor/issues?q=is%3Aissue+label%3A%22area%3A+gpu%22+) 78 of gVisor across different CUDA workloads helps discover and address potential 79 compatibility or performance issues in `nvproxy`. 80 81 `nvproxy` is a passthrough driver that forwards `ioctl(2)` calls made to NVIDIA 82 devices by the containerized application directly to the host NVIDIA driver. 83 This forwarding is straightforward: `ioctl` parameters are copied from the 84 application's address space to the sentry's address space, and then a host 85 `ioctl` syscall is made. `ioctl`s are passed through with minimal intervention; 86 `nvproxy` does not emulate NVIDIA kernel-mode driver (KMD) logic. This design 87 translates to minimal overhead for GPU operations, ensuring that GPU bound 88 workloads experience negligible performance impact. 89 90 However, the presence of pointers and file descriptors within some `ioctl` 91 structs forces `nvproxy` to perform appropriate translations. This requires 92 `nvproxy` to be aware of the KMD's ABI, specifically the layout of `ioctl` 93 structs. The challenge is compounded by the lack of ABI stability guarantees in 94 NVIDIA's KMD, meaning `ioctl` definitions can change arbitrarily between 95 releases. While the NVIDIA installer ensures matching KMD and user-mode driver 96 (UMD) component versions, a single gVisor version might be used with multiple 97 NVIDIA drivers. As a result, `nvproxy` must understand the ABI for each 98 supported driver version, necessitating internal versioning logic for `ioctl`s. 99 100 As a result, `nvproxy` has the following limitations: 101 102 1. Supports selected GPU models. 103 2. Supports selected NVIDIA driver versions. 104 3. Supports selected NVIDIA device files. 105 4. Supports selected `ioctl`s on each device file. 106 107 ### Supported GPUs {#gpu-models} 108 109 gVisor currently supports NVIDIA GPUs: T4, L4, A100, A10G and H100. Please 110 [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+enhancement,area%3A+gpu&template=bug_report.yml) 111 if you want support for another GPU model. 112 113 ### Rolling Version Support Window {#driver-versions} 114 115 The range of driver versions supported by `nvproxy` directly aligns with those 116 available within GKE. As GKE incorporates newer drivers, `nvproxy` will extend 117 support accordingly. Conversely, to manage versioning complexity, `nvproxy` will 118 drop support for drivers removed from GKE. This strategy ensures a streamlined 119 process and avoids unbounded growth in `nvproxy`'s versioning. 120 121 To see what drivers a given `runsc` version supports, run: 122 123 ``` 124 $ runsc nvproxy list-supported-drivers 125 ``` 126 127 ### Supported Device Files {#device-files} 128 129 gVisor only exposes `/dev/nvidiactl`, `/dev/nvidia-uvm` and `/dev/nvidia#`. 130 131 Some unsupported NVIDIA device files are: 132 133 - `/dev/nvidia-caps/*`: Controls `nvidia-capabilities`, which is mainly used 134 by Multi-instance GPUs (MIGs). 135 - `/dev/nvidia-drm`: Plugs into Linux's Direct Rendering Manager (DRM) 136 subsystem. 137 - `/dev/nvidia-modeset`: Enables `DRIVER_MODESET` capability in `nvidia-drm` 138 devices. 139 140 ### Supported `ioctl` Set {#ioctls} 141 142 To minimize maintenance overhead across supported driver versions, the set of 143 supported NVIDIA device `ioctl`s is intentionally limited. This set was 144 generated by running a large number of CUDA workloads in gVisor. As `nvproxy` is 145 adapted to more use cases, this set will continue to evolve. 146 147 Currently, `nvproxy` focuses on supporting compute workloads (like CUDA). 148 Graphics and video capabilities are not yet supported due to missing `ioctl`s. 149 If your GPU compute workload fails with gVisor, please note that some `ioctl` 150 commands might still be unimplemented. Please 151 [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+bug,area%3A+gpu&template=bug_report.yml) 152 to describe about your use case. If a missing `ioctl` implementation is the 153 problem, then the [debug logs](/docs/user_guide/debugging/) will contain 154 warnings with prefix `nvproxy: unknown *`. 155 156 ## Security 157 158 While CUDA support enables important use cases for gVisor, it is important for 159 users to understand the security model around the use of GPUs in sandboxes. In 160 short, while gVisor will protect the host from the sandboxed application, 161 **NVIDIA driver updates must be part of any security plan with or without 162 gVisor**. 163 164 First, a short discussion on 165 [gvisor's security model](../architecture_guide/security.md). gVisor protects 166 the host from sandboxed applications by providing several layers of defense. The 167 layers most relevant to this discussion are the redirection of application 168 syscalls to the gVisor sandbox and use of 169 [seccomp-bpf](https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html) 170 on gVisor sandboxes. 171 172 gVisor uses a "platform" to tell the host kernel to reroute system calls to the 173 sandbox process, known as the sentry. The sentry implements a syscall table, 174 which services all application syscalls. The Sentry *may* make syscalls to the 175 host kernel if it needs them to fulfill the application syscall, but it doesn't 176 merely pass an application syscall to the host kernel. 177 178 On sandbox boot, seccomp filters are applied to the sandbox. Seccomp filters 179 applied to the sandbox constrain the set of syscalls that it can make to the 180 host kernel, blocking access to most host kernel vulnerabilities even if the 181 sandbox becomes compromised. 182 183 For example, [CVE-2022-0185](https://nvd.nist.gov/vuln/detail/CVE-2022-0185) is 184 mitigated because gVisor itself handles the syscalls required to use namespaces 185 and capabilities, so the application is using gVisor's implementation, not the 186 host kernel's. For a compromised sandbox, the syscalls required to exploit the 187 vulnerability are blocked by seccomp filters. 188 189 In addition, seccomp-bpf filters can filter by argument names allowing us to 190 allowlist granularly by `ioctl(2)` arguments. `ioctl(2)` is a source of many 191 bugs in any kernel due to the complexity of its implementation. As of writing, 192 gVisor does 193 [allowlist some `ioctl`s](https://github.com/google/gvisor/blob/ccc3c2cbd26d3514885bd665b0a110150a6e8c53/runsc/boot/filter/config/config_main.go#L111) 194 by argument for things like terminal support. 195 196 For example, [CVE-2024-21626](https://nvd.nist.gov/vuln/detail/CVE-2024-21626) 197 is mitigated by gVisor because the application would use gVisor's implementation 198 of `ioctl(2)`. For a compromised sentry, `ioctl(2)` calls with the needed 199 arguments are not in the seccomp filter allowlist, blocking the attacker from 200 making the call. gVisor also mitigates similar vulnerabilities that come with 201 device drivers 202 ([CVE-2023-33107](https://nvd.nist.gov/vuln/detail/CVE-2023-33107)). 203 204 ### nvproxy Security 205 206 Recall that "nvproxy" allows applications to directly interact with supported 207 ioctls defined in the NVIDIA driver. 208 209 gVisor's seccomp filter rules are modified such that `ioctl(2)` calls can be 210 made 211 [*only for supported ioctls*](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/seccomp_filters.go#L1). 212 The allowlisted rules aligned with each 213 [driver version](https://github.com/google/gvisor/blob/c087777e37a186e38206209c41178e92ef1bbe81/pkg/sentry/devices/nvproxy/version.go#L152). 214 This approach is similar to the allowlisted ioctls for terminal support 215 described above. This allows gVisor to retain the vast majority of its 216 protection for the host while allowing access to GPUs. All of the above CVEs 217 remain mitigated even when "nvproxy" is used. 218 219 However, gVisor is much less effective at mitigating vulnerabilities within the 220 NVIDIA GPU drivers themselves, *because* gVisor passes through calls to be 221 handled by the kernel module. If there is a vulnerability in a given driver for 222 a given GPU `ioctl` (read feature) that gVisor passes through, then gVisor will 223 also be vulnerable. If the vulnerability is in an unimplemented feature, gVisor 224 will block the required calls with seccomp filters. 225 226 In addition, gVisor doesn't introduce any additional hardware-level isolation 227 beyond that which is configured by by the NVIDIA kernel-mode driver. There is no 228 validation of things like DMA buffers. The only checks are done in seccomp-bpf 229 rules to ensure `ioctl(2)` calls are made on supported and allowlisted `ioctl`s. 230 231 Therefore, **it is imperative that users update NVIDIA drivers in a timely 232 manner with or without gVisor**. To see the latest drivers gVisor supports, you 233 can run the following with your runsc release: 234 235 ``` 236 $ runsc nvproxy list-supported-drivers 237 ``` 238 239 Alternatively you can view the 240 [source code](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/version.go#L1) 241 or download it and run: 242 243 ``` 244 $ make run TARGETS=runsc:runsc ARGS="nvproxy list-supported-drivers" 245 ``` 246 247 ### So, if you don't protect against all the things, why even? 248 249 While gVisor doesn't protect against *all* NVIDIA driver vulnerabilities, it 250 *does* protect against a large set of general vulnerabilities in Linux. 251 Applications don't just use GPUs, they use them as a part of a larger 252 application that may include third party libraries. For example, Tensorflow 253 [suffers from the same kind of vulnerabilities](https://nvd.nist.gov/vuln/detail/CVE-2022-29216) 254 that every application does. Designing and implementing an application with 255 security in mind is hard and in the emerging AI space, security is often 256 overlooked in favor of getting to market fast. There are also many services that 257 allow users to run external users' code on the vendor's infrastructure. gVisor 258 is well suited as part of a larger security plan for these and other use cases.