gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/proposals/nvidia_driver_proxy.md (about) 1 # Nvidia Driver Proxy 2 3 Status as of 2023-06-23: Under review 4 5 ## Synopsis 6 7 Allow applications running within gVisor sandboxes to use CUDA on GPUs by 8 providing implementations of Nvidia GPU kernel driver files that proxy ioctls to 9 their host equivalents. 10 11 Non-goals: 12 13 - Provide additional isolation of, or multiplexing between, GPU workloads 14 beyond that provided by the driver and hardware. 15 16 - Support use of GPUs for graphics rendering. 17 18 ## Background 19 20 ### gVisor, Platforms, and Memory Mapping 21 22 gVisor executes unmodified Linux applications in a sandboxed environment. 23 Application system calls are intercepted by gVisor and handled by (in essence) a 24 Go implementation of the Linux kernel called the *sentry*, which in turn 25 executes as a sandboxed userspace process running on a Linux host. 26 27 gVisor can execute application code via a variety of mechanisms, referred to as 28 "platforms". Most platforms can broadly be divided into process-based (ptrace, 29 systrap) and KVM-based (kvm). Process-based platforms execute application code 30 in sandboxed host processes, and establish application memory mappings by 31 invoking the `mmap` syscall from application process context; sentry and 32 application processes share a file descriptor (FD) table, allowing application 33 `mmap` to use sentry FDs. KVM-based platforms execute application code in the 34 guest userspace of a virtual machine, and establish application memory mappings 35 by establishing mappings in the sentry's address space, then forwarding those 36 mappings into the guest physical address space using KVM memslots and finally 37 setting guest page table entries to point to the relevant guest physical 38 addresses. 39 40 ### Nvidia Userspace API 41 42 [`libnvidia-container`](https://github.com/NVIDIA/libnvidia-container) provides 43 code for preparing a container for GPU use, and serves as a useful reference for 44 the environment that applications using GPUs expect. In particular, 45 [`nvc_internal.h`](https://github.com/NVIDIA/libnvidia-container/blob/main/src/nvc_internal.h) 46 contains a helpful list of relevant filesystem paths, while 47 [`configure_command()`](https://github.com/NVIDIA/libnvidia-container/blob/main/src/cli/configure.c) 48 is the primary entry point into container configuration. Of these paths, 49 `/dev/nvidiactl`, `/dev/nvidia#` (per-device, numbering from 0), 50 `/dev/nvidia-uvm`, and `/proc/driver/nvidia/params` are kernel-driver-backed and 51 known to be required. 52 53 Most "control" interactions between applications and the driver consist of 54 invocations of the `ioctl` syscall on `/dev/nvidiactl`, `/dev/nvidia#`, or 55 `/dev/nvidia-uvm`. Application data generally does not flow through ioctls; 56 instead, applications access driver-provided memory mappings. 57 `/proc/driver/nvidia/params` is informational and read-only. 58 59 `/dev/nvidiactl` and `/dev/nvidia#` are backed by the same `struct 60 file_operations nv_frontend_fops` in kernel module `nvidia.ko`, rooted in 61 `kernel-open/nvidia` in the 62 [Nvidia Linux OSS driver source](https://github.com/NVIDIA/open-gpu-kernel-modules.git). 63 The top-level `ioctl` implementation for both, 64 `kernel-open/nvidia/nv.c:nvidia_ioctl()`, handles a small number of ioctl 65 commands but delegates the majority to the "resource manager" (RM) subsystem, 66 `src/nvidia/arch/nvalloc/unix/src/escape.c:RmIoctl()`. Both functions constrain 67 most commands to either `/dev/nvidiactl` or `/dev/nvidia#`, as indicated by the 68 presence of the `NV_CTL_DEVICE_ONLY` or `NV_ACTUAL_DEVICE_ONLY` macros 69 respectively. 70 71 `/dev/nvidia-uvm` is implemented in kernel module `nvidia-uvm.ko`, rooted in 72 `kernel-open/nvidia-uvm` in the OSS driver source; its `ioctl` implementation is 73 `kernel-open/nvidia-uvm/uvm.c:uvm_ioctl()`. 74 75 The driver API models a collection of objects, using numeric handles as 76 references (akin to the relationship between file descriptions and file 77 descriptors). Objects are instances of classes, which exist in a C++-like 78 inheritance hierarchy that is implemented in C via code generation; for example, 79 the `RsResource` class inherits from the `Object` class, which is the 80 hierarchy's root. Objects exist in a tree of parent-child relationships, defined 81 by methods on the `Object` class. API-accessed objects are most frequently 82 created by invocations of `ioctl(NV_ESC_RM_ALLOC)`, which is parameterized by 83 `hClass`. `src/nvidia/src/kernel/rmapi/resource_list.h` specifies the mapping 84 from `hClass` to instantiated ("internal") class, as well as the type of the 85 pointee of `NVOS21_PARAMETERS::pAllocParms` or `NVOS64_PARAMETERS::pAllocParms` 86 which the object's constructor takes as input ("alloc param info"). 87 88 ## Key Issues 89 90 Most application ioctls to GPU drivers can be *proxied* straightforwardly by the 91 sentry: The sentry copies the ioctl's parameter struct, and the transitive 92 closure of structs it points to, from application to sentry memory; reissues the 93 ioctl to the host, passing pointers in the sentry's address space rather than 94 the application's; and copies updated fields (or whole structs for simplicity) 95 back to application memory. Below we consider complications to this basic idea. 96 97 ### Unified Virtual Memory (UVM) 98 99 GPUs are equipped with "device" memory that is much faster for the GPU to access 100 than "system" memory (as used by CPUs). CUDA supports two basic memory models: 101 102 - `cudaMalloc()` allocates device memory, which is not generally usable by the 103 CPU; instead `cudaMemcpy()` is used to copy between system and device 104 memory. 105 106 - `cudaMallocManaged()` allocates "unified memory", which can be used by both 107 CPU and GPU. `nvidia-uvm.ko` backs mappings returned by 108 `cudaMallocManaged()`, migrating pages from system to device memory on GPU 109 page faults and from device to system memory on CPU page faults. 110 111 We cannot implement UVM by substituting a sentry-controlled buffer and copying 112 to/from UVM-controlled memory mappings "on demand", since GPU-side demand is 113 driven by GPU page faults which the sentry cannot intercept directly; instead, 114 we must map `/dev/nvidia-uvm` into application address spaces as in native 115 execution. 116 117 UVM requires that the virtual addresses of all mappings of `nvidia-uvm` match 118 their respective mapped file offset, which in conjunction with the FD uniquely 119 identify a shared memory segment[^cite-uvm-mmap]. Since this constraint also 120 applies to *sentry* mappings of `nvidia-uvm`, if an application happens to 121 request a mapping of `nvidia-uvm` at a virtual address that overlaps with an 122 existing sentry memory mapping, then `memmap.File.MapInternal()` is 123 unimplementable. On KVM-based platforms, this means that we cannot implement the 124 application mapping, since `MapInternal` is a required step to propagating the 125 mapping into application address spaces. On process-based platforms, this only 126 means that we cannot support e.g. `read(2)` syscalls targeting UVM memory; if 127 this is required, we can perform buffered copies from/to UVM memory using 128 `ioctl(UVM_TOOLS_READ/WRITE_PROCESS_MEMORY)`, at the cost of requiring 129 `MapInternal` users to explicitly indicate fill/flush points before/after I/O. 130 131 The extent to which applications use `cudaMallocManaged()` is unclear; use of 132 `cudaMalloc()` and explicit copying appears to predominate in 133 performance-sensitive code. PyTorch contains one non-test use of 134 `cudaMallocManaged()`[^cite-pytorch-uvm], but it is not immediately clear what 135 circumstances cause the containing function to be invoked. Tensorflow does not 136 appear to use `cudaMallocManaged()` outside of test code. 137 138 ### Device Memory Caching 139 140 For both `cudaMalloc()` and "control plane" purposes, applications using CUDA 141 map some device memory into application address spaces, as follows: 142 143 1. The application opens a new `/dev/nvidiactl` or `/dev/nvidia#` FD, depending 144 on the memory being mapped. 145 146 2. The application invokes `ioctl(NV_ESC_RM_MAP_MEMORY)` on an *existing* 147 `/dev/nvidiactl` FD, passing the *new* FD as an ioctl parameter 148 (`nv_ioctl_nvos33_parameters_with_fd::fd`). This ioctl stores information 149 for the mapping in the new FD (`nv_linux_file_private_t::mmap_context`), but 150 does not modify the application's address space. 151 152 3. The application invokes `mmap` on the *new* FD to actually establish the 153 mapping into its address space. 154 155 Conveniently, it is apparently permissible for the `ioctl` in step 2 to be 156 invoked from a different process than the `mmap` in step 3, so no gVisor changes 157 are required to support this pattern in general; we can invoke the `ioctl` in 158 the sentry and implement `mmap` as usual. 159 160 However, mappings of device memory often need to disable or constrain processor 161 caching for correct behavior. In modern x86 processors, caching behavior is 162 specified by page table entry flags[^cite-sdm-pat]. On process-based platforms, 163 application page tables are defined by the host kernel, whose `mmap` will choose 164 the correct caching behavior by delegating to the driver's implementation. On 165 KVM-based platforms, the sentry maintains guest page tables and consequently 166 must set caching behavior correctly. 167 168 Caching behavior for mappings obtained as described above is decided during 169 `NV_ESC_RM_MAP_MEMORY`, by the "method `RsResource::resMap`" for the driver 170 object specified by ioctl parameter `NVOS33_PARAMETERS::hMemory`. In most cases, 171 this eventually results in a call to 172 [`src/nvidia/src/kernel/rmapi/mapping_cpu.c:memMap_IMPL()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/src/kernel/rmapi/mapping_cpu.c#L167) 173 on an associated `Memory` object. Caching behavior thus depends on the logic of 174 that function and the `MEMORY_DESCRIPTOR` associated with the `Memory` object, 175 which is typically determined during object creation. Therefore, to support 176 KVM-based platforms, the sentry could track allocated driver objects and emulate 177 the driver's logic to determine appropriate caching behavior. 178 179 Alternatively, could we replicate the caching behavior of the host kernel's 180 mapping in the sentry's address space (in `vm_area_struct::vm_page_prot`)? There 181 is no apparent way for userspace to obtain this information, so this would 182 necessitate a Linux kernel patch or upstream change. 183 184 ### OS-Described Memory 185 186 `ioctl(NV_ESC_RM_ALLOC_MEMORY, hClass=NV01_MEMORY_SYSTEM_OS_DESCRIPTOR)` and 187 `ioctl(NV_ESC_RM_VID_HEAP_CONTROL, 188 function=NVOS32_FUNCTION_ALLOC_OS_DESCRIPTOR)` create `OsDescMem` objects, which 189 are `Memory` objects backed by application anonymous memory. The ioctls treat 190 `NVOS02_PARAMETERS::pMemory` or `NVOS32_PARAMETERS::data.AllocOsDesc.descriptor` 191 respectively as an application virtual address and call Linux's 192 `pin_user_pages()` or `get_user_pages()` to get `struct page` pointers 193 representing pages starting at that address[^cite-osdesc-rmapi]. Pins are held 194 on those pages for the lifetime of the `OsDescMem` object. 195 196 The proxy driver will need to replicate this behavior in the sentry, though 197 doing so should not require major changes outside of the driver. When one of 198 these ioctls is invoked by an application: 199 200 - Invoke `mmap` to create a temporary `PROT_NONE` mapping in the sentry's 201 address space of the size passed by the application. 202 203 - Call `mm.MemoryManager.Pin()` to acquire file-page references on the given 204 application memory. 205 206 - Call `memmap.File.MapInternal()` to get sentry mappings of pinned 207 file-pages. 208 209 - Use `mremap(old_size=0, flags=MREMAP_FIXED)` to replicate mappings returned 210 by `MapInternal()` into the temporary mapping, resulting in a 211 virtually-contiguous sentry mapping of the application-specified address 212 range. 213 214 - Invoke the host ioctl using the sentry mapping. 215 216 - `munmap` the temporary mapping, which is no longer required after the host 217 ioctl. 218 219 - Hold the file-page references returned by `mm.MemoryManager.Pin()` until an 220 application ioctl is observed freeing the corresponding `OsDescMem`, then 221 call `mm.Unpin()`. 222 223 ### Security Considerations 224 225 Since ioctl parameter structs must be copied into the sentry in order to proxy 226 them, gVisor implicitly restrict the set of application requests to those that 227 are explicitly implemented. We can impose additional restrictions based on 228 parameter values in order to further reduce attack surface, although possibly at 229 the cost of reduced development velocity; introducing new restrictions after 230 launch is difficult due to the risk of regressing existing users. Intuitively, 231 limiting the scope of our support to GPU compute should allow us to narrow API 232 usage to that of the CUDA runtime. [Nvidia GPU driver CVEs are published in 233 moderately large batches every ~3-4 234 months](https://www.nvidia.com/en-us/security/), but insufficient information 235 regarding these CVEs is available for us to determine how many of these 236 vulnerabilities we could mitigate via parameter filtering. 237 238 By default, the driver prevents a `/dev/nvidiactl` FD from using objects created 239 by other `/dev/nvidiactl` FDs[^cite-rm-validate], providing driver-level 240 resource isolation between applications. Since we need to track at least a 241 subset of object allocations for OS-described memory, and possibly for 242 determining memory caching type, we can optionally track *all* objects and 243 further constrain ioctls to using valid object handles if driver-level isolation 244 is believed inadequate. 245 246 While `seccomp-bpf` filters allow us to limit the set of ioctl requests that the 247 sentry can make, they cannot filter based on ioctl parameters passed via memory 248 such as allocation `hClass`, `NV_ESC_RM_CONTROL` command, or 249 `NV_ESC_RM_VID_HEAP_CONTROL` function, limiting the extent to which they can 250 protect the host from a compromised sentry. 251 252 ### `runsc` Container Configuration 253 254 The 255 [Nvidia Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) 256 contains code to configure an unstarted container based on 257 [the GPU support requested by its OCI runtime spec](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#environment-variables-oci-spec), 258 [invoking `nvidia-container-cli` from `libnvidia-container` (described above) to 259 do most of the actual 260 work](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html). 261 It is used ubiquitously for this purpose, including by the 262 [Nvidia device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin). 263 264 The simplest way for `runsc` to obtain Nvidia Container Toolkit's behavior is 265 obviously to use it, either by invoking `nvidia-container-runtime-hook` or by 266 using the Toolkit's code (which is written in Go) directly. However, filesystem 267 modifications made to the container's `/dev` and `/proc` directories on the host 268 will not be application-visible since `runsc` necessarily injects sentry 269 `devtmpfs` and `procfs` mounts at these locations, requiring that `runsc` 270 internally replicate the effects of `libnvidia-container` in these directories. 271 Note that host filesystem modifications are still necessary, since the sentry 272 itself needs access to relevant host device files and MIG capabilities. 273 274 Conversely, we can attempt to emulate the behavior of `nvidia-container-toolkit` 275 and `libnvidia-container` within `runsc`; however, note that 276 `libnvidia-container` executes `ldconfig` to regenerate the container's runtime 277 linker cache after mounting the driver's shared libraries into the 278 container[^cite-nvc-ldcache_update], which is more difficult if said mounts 279 exist within the sentry's VFS rather than on the host. 280 281 ### Proprietary Driver Differences 282 283 When running on the proprietary kernel driver, applications invoke 284 `ioctl(NV_ESC_RM_CONTROL)` commands that do not appear to exist in the OSS 285 driver. The OSS driver lacks support for GPU virtualization[^cite-oss-vgpu]; 286 however, Google Compute Engine (GCE) GPUs are exposed to VMs in passthrough 287 mode[^cite-oss-gce], and Container-Optimized OS (COS) switched to the OSS driver 288 in Milestone 105[^cite-oss-cos], suggesting that OSS-driver-only support may be 289 sufficient. If support for the proprietary driver is required, we can request 290 documentation from Nvidia. 291 292 ### API/ABI Stability 293 294 Nvidia requires that the kernel and userspace components of the driver match 295 versions[^cite-abi-readme], and does not guarantee kernel ABI 296 stability[^cite-abi-discuss], so we may need to support multiple ABI versions in 297 the proxy. It is not immediately clear if this will be a problem in practice. 298 299 ## Proposed Work 300 301 To simplify the initial implementation, we will focus immediate efforts on 302 process-based platforms and defer support for KVM-based platforms to future 303 work. 304 305 In the sentry: 306 307 - Add structure and constant definitions from the Nvidia open-source kernel 308 driver to new package `//pkg/abi/nvidia`. 309 310 - Implement the proxy driver under `//pkg/sentry/devices/nvproxy`, initially 311 comprising `FileDescriptionImpl` implementations proxying `/dev/nvidiactl`, 312 `/dev/nvidia#`, and `/dev/nvidia-uvm`. 313 314 - `/proc/driver/nvidia/params` can probably be (optionally) read once during 315 startup and implemented as a static file in the sentry. 316 317 Each ioctl command and object class is associated with its own parameters type 318 and logic; thus, each needs to be implemented individually. We can generate 319 lists of required commands/classes by running representative applications under 320 [`cuda_ioctl_sniffer`](https://github.com/geohot/cuda_ioctl_sniffer) on a 321 variety of GPUs; a list derived from a minimal CUDA workload run on a single VM 322 follows below. The proxy driver itself should also log unimplemented 323 commands/classes for iterative development. For the most part, known-required 324 commands/classes should be implementable incrementally and in parallel. 325 326 Concurrently, at the API level, i.e. within `//runsc`: 327 328 - Add an option to enable Nvidia GPU support. When this option is enabled, and 329 `runsc` detects that GPU support is requested by the container, it enables 330 the proxy driver (by calling `nvproxy.Register(vfsObj)`) and configures the 331 container consistently with `nvidia-container-toolkit` and 332 `libnvidia-container`. 333 334 Since setting the wrong caching behavior for device memory mappings will 335 fail in unpredictable ways, `runsc` must ensure that GPU support cannot be 336 enabled when an unsupported platform is selected. 337 338 To support Nvidia Multi-Process Service (MPS), we need: 339 340 - Support for `SCM_CREDENTIALS` on host Unix domain sockets; already 341 implemented as part of previous MPS investigation, but not merged. 342 343 - Optional pass-through of `statfs::f_type` through `fsimpl/gofer`; needed for 344 a runsc bind mount of the host's `/dev/shm`, through which MPS shares 345 memory; previously hacked in (optionality not implemented). 346 347 Features required to support Nvidia Persistence Daemon and Nvidia Fabric Manager 348 are currently unknown, but these are not believed to be critical, and we may 349 choose to deliberately deny access to them (and/or MPS) to reduce attack 350 surface. 351 [MPS provides "memory protection" but not "error isolation"](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#cuda-concurrency), 352 so it is not clear that granting MPS access to sandboxed containers is safe. 353 354 Implementation notes: 355 356 - Each application `open` of `/dev/nvidictl`, `/dev/nvidia#`, or 357 `/dev/nvidia-uvm` must be backed by a distinct host FD. Furthermore, the 358 proxy driver cannot go through sentry VFS to obtain this FD since doing so 359 would recursively attempt to open the proxy driver. Instead, we must allow 360 the proxy driver to invoke host `openat`, and ensure that the mount 361 namespace in which the sentry executes contains the required device special 362 files. 363 364 - `/dev/nvidia-uvm` FDs may need to be `UVM_INITIALIZE`d with 365 `UVM_INIT_FLAGS_MULTI_PROCESS_SHARING_MODE` to be used from both sentry and 366 application processes[^cite-uvm-va_space_mm_enabled]. 367 368 - Known-used `nvidia.ko` ioctls: `NV_ESC_CHECK_VERSION_STR`, 369 `NV_ESC_SYS_PARAMS`, `NV_ESC_CARD_INFO`, `NV_ESC_NUMA_INFO`, 370 `NV_ESC_REGISTER_FD`, `NV_ESC_RM_ALLOC`, `NV_ESC_RM_ALLOC_MEMORY`, 371 `NV_ESC_RM_ALLOC_OS_EVENT`, `NV_ESC_RM_CONTROL`, `NV_ESC_RM_FREE`, 372 `NV_ESC_RM_MAP_MEMORY`, `NV_ESC_RM_VID_HEAP_CONTROL`, 373 `NV_ESC_RM_DUP_OBJECT`, `NV_ESC_RM_UPDATE_DEVICE_MAPPING_INFO` 374 375 - `NV_ESC_RM_CONTROL` is essentially another level of ioctls. Known-used 376 `NVOS54_PARAMETERS::cmd`: `NV0000_CTRL_CMD_SYSTEM_GET_BUILD_VERSION`, 377 `NV0000_CTRL_CMD_CLIENT_SET_INHERITED_SHARE_POLICY`, 378 `NV0000_CTRL_CMD_SYSTEM_GET_FABRIC_STATUS`, 379 `NV0000_CTRL_CMD_GPU_GET_PROBED_IDS`, 380 `NV0000_CTRL_CMD_SYNC_GPU_BOOST_GROUP_INFO`, 381 `NV0000_CTRL_CMD_GPU_ATTACH_IDS`, `NV0000_CTRL_CMD_GPU_GET_ID_INFO`, 382 `NV0000_CTRL_CMD_GPU_GET_ATTACHED_IDS`, 383 `NV2080_CTRL_CMD_GPU_GET_ACTIVE_PARTITION_IDS`, 384 `NV2080_CTRL_CMD_GPU_GET_GID_INFO`, 385 `NV0080_CTRL_CMD_GPU_GET_VIRTUALIZATION_MODE`, 386 `NV2080_CTRL_CMD_FB_GET_INFO`, `NV2080_CTRL_CMD_GPU_GET_INFO`, 387 `NV0080_CTRL_CMD_MC_GET_ARCH_INFO`, `NV2080_CTRL_CMD_BUS_GET_INFO`, 388 `NV2080_CTRL_CMD_BUS_GET_PCI_INFO`, `NV2080_CTRL_CMD_BUS_GET_PCI_BAR_INFO`, 389 `NV2080_CTRL_CMD_GPU_QUERY_ECC_STATUS`, `NV0080_CTRL_FIFO_GET_CAPS`, 390 `NV0080_CTRL_CMD_GPU_GET_CLASSLIST`, `NV2080_CTRL_CMD_GPU_GET_ENGINES`, 391 `NV2080_CTRL_CMD_GPU_GET_SIMULATION_INFO`, 392 `NV0000_CTRL_CMD_GPU_GET_MEMOP_ENABLE`, `NV2080_CTRL_CMD_GR_GET_INFO`, 393 `NV2080_CTRL_CMD_GR_GET_GPC_MASK`, `NV2080_CTRL_CMD_GR_GET_TPC_MASK`, 394 `NV2080_CTRL_CMD_GR_GET_CAPS_V2`, `NV2080_CTRL_CMD_CE_GET_CAPS`, 395 `NV2080_CTRL_CMD_GPU_GET_COMPUTE_POLICY_CONFIG`, 396 `NV2080_CTRL_CMD_GR_GET_GLOBAL_SM_ORDER`, `NV0080_CTRL_CMD_FB_GET_CAPS`, 397 `NV0000_CTRL_CMD_CLIENT_GET_ADDR_SPACE_TYPE`, 398 `NV2080_CTRL_CMD_GSP_GET_FEATURES`, 399 `NV2080_CTRL_CMD_GPU_GET_SHORT_NAME_STRING`, 400 `NV2080_CTRL_CMD_GPU_GET_NAME_STRING`, 401 `NV2080_CTRL_CMD_GPU_QUERY_COMPUTE_MODE_RULES`, 402 `NV2080_CTRL_CMD_RC_RELEASE_WATCHDOG_REQUESTS`, 403 `NV2080_CTRL_CMD_RC_SOFT_DISABLE_WATCHDOG`, 404 `NV2080_CTRL_CMD_NVLINK_GET_NVLINK_STATUS`, 405 `NV2080_CTRL_CMD_RC_GET_WATCHDOG_INFO`, `NV2080_CTRL_CMD_PERF_BOOST`, 406 `NV0080_CTRL_CMD_FIFO_GET_CHANNELLIST`, `NVC36F_CTRL_GET_CLASS_ENGINEID`, 407 `NVC36F_CTRL_CMD_GPFIFO_GET_WORK_SUBMIT_TOKEN`, 408 `NV2080_CTRL_CMD_GR_GET_CTX_BUFFER_SIZE`, `NVA06F_CTRL_CMD_GPFIFO_SCHEDULE` 409 410 - Known-used `NVOS54_PARAMETERS::cmd` that are apparently unimplemented and 411 may be proprietary-driver-only (or just well-hidden?): 0x20800159, 412 0x20800161, 0x20801001, 0x20801009, 0x2080100a, 0x20802016, 0x20802084, 413 0x503c0102, 0x90e60102 414 415 - Known-used `nvidia-uvm.ko` ioctls: `UVM_INITIALIZE`, 416 `UVM_PAGEABLE_MEM_ACCESS`, `UVM_REGISTER_GPU`, `UVM_CREATE_RANGE_GROUP`, 417 `UVM_REGISTER_GPU_VASPACE`, `UVM_CREATE_EXTERNAL_RANGE`, 418 `UVM_MAP_EXTERNAL_ALLOCATION`, `UVM_REGISTER_CHANNEL`, 419 `UVM_ALLOC_SEMAPHORE_POOL`, `UVM_VALIDATE_VA_RANGE` 420 421 - Known-used `NV_ESC_RM_ALLOC` `hClass`, i.e. allocated object classes: 422 `NV01_ROOT_CLIENT`, `MPS_COMPUTE`, `NV01_DEVICE_0`, `NV20_SUBDEVICE_0`, 423 `TURING_USERMODE_A`, `FERMI_VASPACE_A`, `NV50_THIRD_PARTY_P2P`, 424 `FERMI_CONTEXT_SHARE_A`, `TURING_CHANNEL_GPFIFO_A`, `TURING_COMPUTE_A`, 425 `TURING_DMA_COPY_A`, `NV01_EVENT_OS_EVENT`, `KEPLER_CHANNEL_GROUP_A` 426 427 ## References 428 429 [^cite-abi-discuss]: "[The] RMAPI currently does not have any ABI stability 430 guarantees whatsoever, and even API compatibility breaks 431 occasionally." - 432 https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-2757388 433 [^cite-abi-readme]: "This is the source release of the NVIDIA Linux open GPU 434 kernel modules, version 530.41.03. ... Note that the kernel 435 modules built here must be used with GSP firmware and 436 user-space NVIDIA GPU driver components from a corresponding 437 530.41.03 driver release." - 438 https://github.com/NVIDIA/open-gpu-kernel-modules/blob/6dd092ddb7c165fb1ec48b937fa6b33daa37f9c1/README.md 439 [^cite-nvc-ldcache_update]: [`src/nvc_ldcache.c:nvc_ldcache_update()`](https://github.com/NVIDIA/libnvidia-container/blob/eb0415c458c5e5d97cb8ac08b42803d075ed73cd/src/nvc_ldcache.c#L355) 440 [^cite-osdesc-rmapi]: [`src/nvidia/arch/nvalloc/unix/src/escape.c:RmCreateOsDescriptor()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/arch/nvalloc/unix/src/escape.c#L120) 441 => 442 [`kernel-open/nvidia/os-mlock.c:os_lock_user_pages()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia/os-mlock.c#L214) 443 [^cite-oss-cos]: "Upgraded Nvidia latest drivers from v510.108.03 to v525.60.13 444 (OSS)." - 445 https://cloud.google.com/container-optimized-os/docs/release-notes/m105#cos-beta-105-17412-1-2_vs_milestone_101_. 446 Also see b/235364591, go/cos-oss-gpu. 447 [^cite-oss-gce]: "Compute Engine provides NVIDIA GPUs for your VMs in 448 passthrough mode so that your VMs have direct control over the 449 GPUs and their associated memory." - 450 https://cloud.google.com/compute/docs/gpus 451 [^cite-oss-vgpu]: "The currently published driver does not support 452 virtualization, neither as a host nor a guest." - 453 https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-2752052 454 [^cite-pytorch-uvm]: [`c10/cuda/CUDADeviceAssertionHost.cpp:c10::cuda::CUDAKernelLaunchRegistry::get_uvm_assertions_ptr_for_current_device()`](https://github.com/pytorch/pytorch/blob/3f5d768b561e3edd17e93fd4daa7248f9d600bb2/c10/cuda/CUDADeviceAssertionHost.cpp#L268) 455 [^cite-rm-validate]: See calls to `clientValidate()` in 456 [`src/nvidia/src/libraries/resserv/src/rs_server.c`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/6dd092ddb7c165fb1ec48b937fa6b33daa37f9c1/src/nvidia/src/libraries/resserv/src/rs_server.c) 457 => 458 [`src/nvidia/src/kernel/rmapi/client.c:rmclientValidate_IMPL()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/src/kernel/rmapi/client.c#L728). 459 `API_SECURITY_INFO::clientOSInfo` is set by 460 [`src/nvidia/arch/nvalloc/unix/src/escape.c:RmIoctl()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/arch/nvalloc/unix/src/escape.c#L300). 461 Both `PDB_PROP_SYS_VALIDATE_CLIENT_HANDLE` and 462 `PDB_PROP_SYS_VALIDATE_CLIENT_HANDLE_STRICT` are enabled by 463 default by 464 [`src/nvidia/generated/g_system_nvoc.c:__nvoc_init_dataField_OBJSYS()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/generated/g_system_nvoc.c#L84). 465 [^cite-sdm-pat]: Intel SDM Vol. 3, Sec. 12.12 "Page Attribute Table (PAT)" 466 [^cite-uvm-mmap]: [`kernel-open/nvidia-uvm/uvm.c:uvm_mmap()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia-uvm/uvm.c#L557) 467 [^cite-uvm-va_space_mm_enabled]: [`kernel-open/nvidia-uvm/uvm_va_space_mm.c:uvm_va_space_mm_enabled()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia-uvm/uvm_va_space_mm.c#L188)