gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/gpu.md

gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/gpu.md (about)

1 # GPU Support
2
3 [TOC]
4
5 gVisor adds a layer of security to your AI/ML applications or other CUDA
6 workloads while adding negligible overhead. By running these applications in a
7 sandboxed environment, you can isolate your host system from potential
8 vulnerabilities in AI code. This is crucial for handling sensitive data or
9 deploying untrusted AI workloads.
10
11 gVisor supports running most CUDA applications on preselected versions of
12 [NVIDIA's open source driver](https://github.com/NVIDIA/open-gpu-kernel-modules).
13 To achieve this, gVisor implements a proxy driver inside the sandbox, henceforth
14 referred to as `nvproxy`. `nvproxy` proxies the application's interactions with
15 NVIDIA's driver on the host. It provides access to NVIDIA GPU-specific devices
16 to the sandboxed application. The CUDA application can run unmodified inside the
17 sandbox and interact transparently with these devices.
18
19 ## Environments
20
21 The `runsc` flag `--nvproxy` must be specified to enable GPU support. gVisor
22 supports GPUs in the following environments.
23
24 ### NVIDIA Container Runtime
25
26 The
27 [`nvidia-container-runtime`](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime)
28 is packaged as part of the
29 [NVIDIA GPU Container Stack](https://github.com/NVIDIA/nvidia-container-toolkit).
30 This runtime is just a shim and delegates all commands to the configured low
31 level runtime (which defaults to `runc`). To use gVisor, specify `runsc` as the
32 low level runtime in `/etc/nvidia-container-runtime/config.toml`
33 [via the `runtimes` option](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#low-level-runtime-path)
34 and then run CUDA containers with `nvidia-container-runtime`.
35
36 NOTE: gVisor currently only supports
37 [legacy mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#legacy-mode).
38 The alternative,
39 [csv mode](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#csv-mode),
40 is not yet supported.
41
42 ### Docker
43
44 The "legacy" mode of `nvidia-container-runtime` is directly compatible with the
45 `--gpus` flag implemented by the docker CLI. So with Docker, `runsc` can be used
46 directly (without having to go through `nvidia-container-runtime`).
47
48 ```
49 $ docker run --runtime=runsc --gpus=all --rm -it nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8
50 [Vector addition of 50000 elements]
51 Copy input data from the host memory to the CUDA device
52 CUDA kernel launch with 196 blocks of 256 threads
53 Copy output data from the CUDA device to the host memory
54 Test PASSED
55 Done
56 ```
57
58 ### GKE Device Plugin
59
60 [GKE](https://cloud.google.com/kubernetes-engine) uses a different GPU container
61 stack than NVIDIA's. GKE has
62 [its own device plugin](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
63 (which is different from
64 [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin)). GKE's
65 plugin modifies the container spec in a different way than the above-mentioned
66 methods.
67
68 NOTE: `nvproxy` does not have integration support for `k8s-device-plugin` yet.
69 So k8s environments other than GKE might not be supported.
70
71 ## Compatibility
72
73 gVisor supports a wide range of CUDA workloads, including PyTorch and various
74 generative models like LLMs. Check out
75 [this blog post about running Stable Diffusion with gVisor](/blog/2023/06/20/gpu-pytorch-stable-diffusion/).
76 gVisor undergoes continuous tests to ensure this functionality remains robust.
77 [Real-world usage](https://github.com/google/gvisor/issues?q=is%3Aissue+label%3A%22area%3A+gpu%22+)
78 of gVisor across different CUDA workloads helps discover and address potential
79 compatibility or performance issues in `nvproxy`.
80
81 `nvproxy` is a passthrough driver that forwards `ioctl(2)` calls made to NVIDIA
82 devices by the containerized application directly to the host NVIDIA driver.
83 This forwarding is straightforward: `ioctl` parameters are copied from the
84 application's address space to the sentry's address space, and then a host
85 `ioctl` syscall is made. `ioctl`s are passed through with minimal intervention;
86 `nvproxy` does not emulate NVIDIA kernel-mode driver (KMD) logic. This design
87 translates to minimal overhead for GPU operations, ensuring that GPU bound
88 workloads experience negligible performance impact.
89
90 However, the presence of pointers and file descriptors within some `ioctl`
91 structs forces `nvproxy` to perform appropriate translations. This requires
92 `nvproxy` to be aware of the KMD's ABI, specifically the layout of `ioctl`
93 structs. The challenge is compounded by the lack of ABI stability guarantees in
94 NVIDIA's KMD, meaning `ioctl` definitions can change arbitrarily between
95 releases. While the NVIDIA installer ensures matching KMD and user-mode driver
96 (UMD) component versions, a single gVisor version might be used with multiple
97 NVIDIA drivers. As a result, `nvproxy` must understand the ABI for each
98 supported driver version, necessitating internal versioning logic for `ioctl`s.
99
100 As a result, `nvproxy` has the following limitations:
101
102 1. Supports selected GPU models.
103 2. Supports selected NVIDIA driver versions.
104 3. Supports selected NVIDIA device files.
105 4. Supports selected `ioctl`s on each device file.
106
107 ### Supported GPUs {#gpu-models}
108
109 gVisor currently supports NVIDIA GPUs: T4, L4, A100, A10G and H100. Please
110 [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+enhancement,area%3A+gpu&template=bug_report.yml)
111 if you want support for another GPU model.
112
113 ### Rolling Version Support Window {#driver-versions}
114
115 The range of driver versions supported by `nvproxy` directly aligns with those
116 available within GKE. As GKE incorporates newer drivers, `nvproxy` will extend
117 support accordingly. Conversely, to manage versioning complexity, `nvproxy` will
118 drop support for drivers removed from GKE. This strategy ensures a streamlined
119 process and avoids unbounded growth in `nvproxy`'s versioning.
120
121 To see what drivers a given `runsc` version supports, run:
122
123 ```
124 $ runsc nvproxy list-supported-drivers
125 ```
126
127 ### Supported Device Files {#device-files}
128
129 gVisor only exposes `/dev/nvidiactl`, `/dev/nvidia-uvm` and `/dev/nvidia#`.
130
131 Some unsupported NVIDIA device files are:
132
133 - `/dev/nvidia-caps/*`: Controls `nvidia-capabilities`, which is mainly used
134 by Multi-instance GPUs (MIGs).
135 - `/dev/nvidia-drm`: Plugs into Linux's Direct Rendering Manager (DRM)
136 subsystem.
137 - `/dev/nvidia-modeset`: Enables `DRIVER_MODESET` capability in `nvidia-drm`
138 devices.
139
140 ### Supported `ioctl` Set {#ioctls}
141
142 To minimize maintenance overhead across supported driver versions, the set of
143 supported NVIDIA device `ioctl`s is intentionally limited. This set was
144 generated by running a large number of CUDA workloads in gVisor. As `nvproxy` is
145 adapted to more use cases, this set will continue to evolve.
146
147 Currently, `nvproxy` focuses on supporting compute workloads (like CUDA).
148 Graphics and video capabilities are not yet supported due to missing `ioctl`s.
149 If your GPU compute workload fails with gVisor, please note that some `ioctl`
150 commands might still be unimplemented. Please
151 [open a GitHub issue](https://github.com/google/gvisor/issues/new?labels=type%3A+bug,area%3A+gpu&template=bug_report.yml)
152 to describe about your use case. If a missing `ioctl` implementation is the
153 problem, then the [debug logs](/docs/user_guide/debugging/) will contain
154 warnings with prefix `nvproxy: unknown *`.
155
156 ## Security
157
158 While CUDA support enables important use cases for gVisor, it is important for
159 users to understand the security model around the use of GPUs in sandboxes. In
160 short, while gVisor will protect the host from the sandboxed application,
161 **NVIDIA driver updates must be part of any security plan with or without
162 gVisor**.
163
164 First, a short discussion on
165 [gvisor's security model](../architecture_guide/security.md). gVisor protects
166 the host from sandboxed applications by providing several layers of defense. The
167 layers most relevant to this discussion are the redirection of application
168 syscalls to the gVisor sandbox and use of
169 [seccomp-bpf](https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html)
170 on gVisor sandboxes.
171
172 gVisor uses a "platform" to tell the host kernel to reroute system calls to the
173 sandbox process, known as the sentry. The sentry implements a syscall table,
174 which services all application syscalls. The Sentry *may* make syscalls to the
175 host kernel if it needs them to fulfill the application syscall, but it doesn't
176 merely pass an application syscall to the host kernel.
177
178 On sandbox boot, seccomp filters are applied to the sandbox. Seccomp filters
179 applied to the sandbox constrain the set of syscalls that it can make to the
180 host kernel, blocking access to most host kernel vulnerabilities even if the
181 sandbox becomes compromised.
182
183 For example, [CVE-2022-0185](https://nvd.nist.gov/vuln/detail/CVE-2022-0185) is
184 mitigated because gVisor itself handles the syscalls required to use namespaces
185 and capabilities, so the application is using gVisor's implementation, not the
186 host kernel's. For a compromised sandbox, the syscalls required to exploit the
187 vulnerability are blocked by seccomp filters.
188
189 In addition, seccomp-bpf filters can filter by argument names allowing us to
190 allowlist granularly by `ioctl(2)` arguments. `ioctl(2)` is a source of many
191 bugs in any kernel due to the complexity of its implementation. As of writing,
192 gVisor does
193 [allowlist some `ioctl`s](https://github.com/google/gvisor/blob/ccc3c2cbd26d3514885bd665b0a110150a6e8c53/runsc/boot/filter/config/config_main.go#L111)
194 by argument for things like terminal support.
195
196 For example, [CVE-2024-21626](https://nvd.nist.gov/vuln/detail/CVE-2024-21626)
197 is mitigated by gVisor because the application would use gVisor's implementation
198 of `ioctl(2)`. For a compromised sentry, `ioctl(2)` calls with the needed
199 arguments are not in the seccomp filter allowlist, blocking the attacker from
200 making the call. gVisor also mitigates similar vulnerabilities that come with
201 device drivers
202 ([CVE-2023-33107](https://nvd.nist.gov/vuln/detail/CVE-2023-33107)).
203
204 ### nvproxy Security
205
206 Recall that "nvproxy" allows applications to directly interact with supported
207 ioctls defined in the NVIDIA driver.
208
209 gVisor's seccomp filter rules are modified such that `ioctl(2)` calls can be
210 made
211 [*only for supported ioctls*](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/seccomp_filters.go#L1).
212 The allowlisted rules aligned with each
213 [driver version](https://github.com/google/gvisor/blob/c087777e37a186e38206209c41178e92ef1bbe81/pkg/sentry/devices/nvproxy/version.go#L152).
214 This approach is similar to the allowlisted ioctls for terminal support
215 described above. This allows gVisor to retain the vast majority of its
216 protection for the host while allowing access to GPUs. All of the above CVEs
217 remain mitigated even when "nvproxy" is used.
218
219 However, gVisor is much less effective at mitigating vulnerabilities within the
220 NVIDIA GPU drivers themselves, *because* gVisor passes through calls to be
221 handled by the kernel module. If there is a vulnerability in a given driver for
222 a given GPU `ioctl` (read feature) that gVisor passes through, then gVisor will
223 also be vulnerable. If the vulnerability is in an unimplemented feature, gVisor
224 will block the required calls with seccomp filters.
225
226 In addition, gVisor doesn't introduce any additional hardware-level isolation
227 beyond that which is configured by by the NVIDIA kernel-mode driver. There is no
228 validation of things like DMA buffers. The only checks are done in seccomp-bpf
229 rules to ensure `ioctl(2)` calls are made on supported and allowlisted `ioctl`s.
230
231 Therefore, **it is imperative that users update NVIDIA drivers in a timely
232 manner with or without gVisor**. To see the latest drivers gVisor supports, you
233 can run the following with your runsc release:
234
235 ```
236 $ runsc nvproxy list-supported-drivers
237 ```
238
239 Alternatively you can view the
240 [source code](https://github.com/google/gvisor/blob/be9169a6ce095a08b99940a97db3f58e5c5bd2ce/pkg/sentry/devices/nvproxy/version.go#L1)
241 or download it and run:
242
243 ```
244 $ make run TARGETS=runsc:runsc ARGS="nvproxy list-supported-drivers"
245 ```
246
247 ### So, if you don't protect against all the things, why even?
248
249 While gVisor doesn't protect against *all* NVIDIA driver vulnerabilities, it
250 *does* protect against a large set of general vulnerabilities in Linux.
251 Applications don't just use GPUs, they use them as a part of a larger
252 application that may include third party libraries. For example, Tensorflow
253 [suffers from the same kind of vulnerabilities](https://nvd.nist.gov/vuln/detail/CVE-2022-29216)
254 that every application does. Designing and implementing an application with
255 security in mind is hard and in the emerging AI space, security is often
256 overlooked in favor of getting to market fast. There are also many services that
257 allow users to run external users' code on the vendor's infrastructure. gVisor
258 is well suited as part of a larger security plan for these and other use cases.