gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/proposals/nvidia_driver_proxy.md (about)

     1  # Nvidia Driver Proxy
     2  
     3  Status as of 2023-06-23: Under review
     4  
     5  ## Synopsis
     6  
     7  Allow applications running within gVisor sandboxes to use CUDA on GPUs by
     8  providing implementations of Nvidia GPU kernel driver files that proxy ioctls to
     9  their host equivalents.
    10  
    11  Non-goals:
    12  
    13  -   Provide additional isolation of, or multiplexing between, GPU workloads
    14      beyond that provided by the driver and hardware.
    15  
    16  -   Support use of GPUs for graphics rendering.
    17  
    18  ## Background
    19  
    20  ### gVisor, Platforms, and Memory Mapping
    21  
    22  gVisor executes unmodified Linux applications in a sandboxed environment.
    23  Application system calls are intercepted by gVisor and handled by (in essence) a
    24  Go implementation of the Linux kernel called the *sentry*, which in turn
    25  executes as a sandboxed userspace process running on a Linux host.
    26  
    27  gVisor can execute application code via a variety of mechanisms, referred to as
    28  "platforms". Most platforms can broadly be divided into process-based (ptrace,
    29  systrap) and KVM-based (kvm). Process-based platforms execute application code
    30  in sandboxed host processes, and establish application memory mappings by
    31  invoking the `mmap` syscall from application process context; sentry and
    32  application processes share a file descriptor (FD) table, allowing application
    33  `mmap` to use sentry FDs. KVM-based platforms execute application code in the
    34  guest userspace of a virtual machine, and establish application memory mappings
    35  by establishing mappings in the sentry's address space, then forwarding those
    36  mappings into the guest physical address space using KVM memslots and finally
    37  setting guest page table entries to point to the relevant guest physical
    38  addresses.
    39  
    40  ### Nvidia Userspace API
    41  
    42  [`libnvidia-container`](https://github.com/NVIDIA/libnvidia-container) provides
    43  code for preparing a container for GPU use, and serves as a useful reference for
    44  the environment that applications using GPUs expect. In particular,
    45  [`nvc_internal.h`](https://github.com/NVIDIA/libnvidia-container/blob/main/src/nvc_internal.h)
    46  contains a helpful list of relevant filesystem paths, while
    47  [`configure_command()`](https://github.com/NVIDIA/libnvidia-container/blob/main/src/cli/configure.c)
    48  is the primary entry point into container configuration. Of these paths,
    49  `/dev/nvidiactl`, `/dev/nvidia#` (per-device, numbering from 0),
    50  `/dev/nvidia-uvm`, and `/proc/driver/nvidia/params` are kernel-driver-backed and
    51  known to be required.
    52  
    53  Most "control" interactions between applications and the driver consist of
    54  invocations of the `ioctl` syscall on `/dev/nvidiactl`, `/dev/nvidia#`, or
    55  `/dev/nvidia-uvm`. Application data generally does not flow through ioctls;
    56  instead, applications access driver-provided memory mappings.
    57  `/proc/driver/nvidia/params` is informational and read-only.
    58  
    59  `/dev/nvidiactl` and `/dev/nvidia#` are backed by the same `struct
    60  file_operations nv_frontend_fops` in kernel module `nvidia.ko`, rooted in
    61  `kernel-open/nvidia` in the
    62  [Nvidia Linux OSS driver source](https://github.com/NVIDIA/open-gpu-kernel-modules.git).
    63  The top-level `ioctl` implementation for both,
    64  `kernel-open/nvidia/nv.c:nvidia_ioctl()`, handles a small number of ioctl
    65  commands but delegates the majority to the "resource manager" (RM) subsystem,
    66  `src/nvidia/arch/nvalloc/unix/src/escape.c:RmIoctl()`. Both functions constrain
    67  most commands to either `/dev/nvidiactl` or `/dev/nvidia#`, as indicated by the
    68  presence of the `NV_CTL_DEVICE_ONLY` or `NV_ACTUAL_DEVICE_ONLY` macros
    69  respectively.
    70  
    71  `/dev/nvidia-uvm` is implemented in kernel module `nvidia-uvm.ko`, rooted in
    72  `kernel-open/nvidia-uvm` in the OSS driver source; its `ioctl` implementation is
    73  `kernel-open/nvidia-uvm/uvm.c:uvm_ioctl()`.
    74  
    75  The driver API models a collection of objects, using numeric handles as
    76  references (akin to the relationship between file descriptions and file
    77  descriptors). Objects are instances of classes, which exist in a C++-like
    78  inheritance hierarchy that is implemented in C via code generation; for example,
    79  the `RsResource` class inherits from the `Object` class, which is the
    80  hierarchy's root. Objects exist in a tree of parent-child relationships, defined
    81  by methods on the `Object` class. API-accessed objects are most frequently
    82  created by invocations of `ioctl(NV_ESC_RM_ALLOC)`, which is parameterized by
    83  `hClass`. `src/nvidia/src/kernel/rmapi/resource_list.h` specifies the mapping
    84  from `hClass` to instantiated ("internal") class, as well as the type of the
    85  pointee of `NVOS21_PARAMETERS::pAllocParms` or `NVOS64_PARAMETERS::pAllocParms`
    86  which the object's constructor takes as input ("alloc param info").
    87  
    88  ## Key Issues
    89  
    90  Most application ioctls to GPU drivers can be *proxied* straightforwardly by the
    91  sentry: The sentry copies the ioctl's parameter struct, and the transitive
    92  closure of structs it points to, from application to sentry memory; reissues the
    93  ioctl to the host, passing pointers in the sentry's address space rather than
    94  the application's; and copies updated fields (or whole structs for simplicity)
    95  back to application memory. Below we consider complications to this basic idea.
    96  
    97  ### Unified Virtual Memory (UVM)
    98  
    99  GPUs are equipped with "device" memory that is much faster for the GPU to access
   100  than "system" memory (as used by CPUs). CUDA supports two basic memory models:
   101  
   102  -   `cudaMalloc()` allocates device memory, which is not generally usable by the
   103      CPU; instead `cudaMemcpy()` is used to copy between system and device
   104      memory.
   105  
   106  -   `cudaMallocManaged()` allocates "unified memory", which can be used by both
   107      CPU and GPU. `nvidia-uvm.ko` backs mappings returned by
   108      `cudaMallocManaged()`, migrating pages from system to device memory on GPU
   109      page faults and from device to system memory on CPU page faults.
   110  
   111  We cannot implement UVM by substituting a sentry-controlled buffer and copying
   112  to/from UVM-controlled memory mappings "on demand", since GPU-side demand is
   113  driven by GPU page faults which the sentry cannot intercept directly; instead,
   114  we must map `/dev/nvidia-uvm` into application address spaces as in native
   115  execution.
   116  
   117  UVM requires that the virtual addresses of all mappings of `nvidia-uvm` match
   118  their respective mapped file offset, which in conjunction with the FD uniquely
   119  identify a shared memory segment[^cite-uvm-mmap]. Since this constraint also
   120  applies to *sentry* mappings of `nvidia-uvm`, if an application happens to
   121  request a mapping of `nvidia-uvm` at a virtual address that overlaps with an
   122  existing sentry memory mapping, then `memmap.File.MapInternal()` is
   123  unimplementable. On KVM-based platforms, this means that we cannot implement the
   124  application mapping, since `MapInternal` is a required step to propagating the
   125  mapping into application address spaces. On process-based platforms, this only
   126  means that we cannot support e.g. `read(2)` syscalls targeting UVM memory; if
   127  this is required, we can perform buffered copies from/to UVM memory using
   128  `ioctl(UVM_TOOLS_READ/WRITE_PROCESS_MEMORY)`, at the cost of requiring
   129  `MapInternal` users to explicitly indicate fill/flush points before/after I/O.
   130  
   131  The extent to which applications use `cudaMallocManaged()` is unclear; use of
   132  `cudaMalloc()` and explicit copying appears to predominate in
   133  performance-sensitive code. PyTorch contains one non-test use of
   134  `cudaMallocManaged()`[^cite-pytorch-uvm], but it is not immediately clear what
   135  circumstances cause the containing function to be invoked. Tensorflow does not
   136  appear to use `cudaMallocManaged()` outside of test code.
   137  
   138  ### Device Memory Caching
   139  
   140  For both `cudaMalloc()` and "control plane" purposes, applications using CUDA
   141  map some device memory into application address spaces, as follows:
   142  
   143  1.  The application opens a new `/dev/nvidiactl` or `/dev/nvidia#` FD, depending
   144      on the memory being mapped.
   145  
   146  2.  The application invokes `ioctl(NV_ESC_RM_MAP_MEMORY)` on an *existing*
   147      `/dev/nvidiactl` FD, passing the *new* FD as an ioctl parameter
   148      (`nv_ioctl_nvos33_parameters_with_fd::fd`). This ioctl stores information
   149      for the mapping in the new FD (`nv_linux_file_private_t::mmap_context`), but
   150      does not modify the application's address space.
   151  
   152  3.  The application invokes `mmap` on the *new* FD to actually establish the
   153      mapping into its address space.
   154  
   155  Conveniently, it is apparently permissible for the `ioctl` in step 2 to be
   156  invoked from a different process than the `mmap` in step 3, so no gVisor changes
   157  are required to support this pattern in general; we can invoke the `ioctl` in
   158  the sentry and implement `mmap` as usual.
   159  
   160  However, mappings of device memory often need to disable or constrain processor
   161  caching for correct behavior. In modern x86 processors, caching behavior is
   162  specified by page table entry flags[^cite-sdm-pat]. On process-based platforms,
   163  application page tables are defined by the host kernel, whose `mmap` will choose
   164  the correct caching behavior by delegating to the driver's implementation. On
   165  KVM-based platforms, the sentry maintains guest page tables and consequently
   166  must set caching behavior correctly.
   167  
   168  Caching behavior for mappings obtained as described above is decided during
   169  `NV_ESC_RM_MAP_MEMORY`, by the "method `RsResource::resMap`" for the driver
   170  object specified by ioctl parameter `NVOS33_PARAMETERS::hMemory`. In most cases,
   171  this eventually results in a call to
   172  [`src/nvidia/src/kernel/rmapi/mapping_cpu.c:memMap_IMPL()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/src/kernel/rmapi/mapping_cpu.c#L167)
   173  on an associated `Memory` object. Caching behavior thus depends on the logic of
   174  that function and the `MEMORY_DESCRIPTOR` associated with the `Memory` object,
   175  which is typically determined during object creation. Therefore, to support
   176  KVM-based platforms, the sentry could track allocated driver objects and emulate
   177  the driver's logic to determine appropriate caching behavior.
   178  
   179  Alternatively, could we replicate the caching behavior of the host kernel's
   180  mapping in the sentry's address space (in `vm_area_struct::vm_page_prot`)? There
   181  is no apparent way for userspace to obtain this information, so this would
   182  necessitate a Linux kernel patch or upstream change.
   183  
   184  ### OS-Described Memory
   185  
   186  `ioctl(NV_ESC_RM_ALLOC_MEMORY, hClass=NV01_MEMORY_SYSTEM_OS_DESCRIPTOR)` and
   187  `ioctl(NV_ESC_RM_VID_HEAP_CONTROL,
   188  function=NVOS32_FUNCTION_ALLOC_OS_DESCRIPTOR)` create `OsDescMem` objects, which
   189  are `Memory` objects backed by application anonymous memory. The ioctls treat
   190  `NVOS02_PARAMETERS::pMemory` or `NVOS32_PARAMETERS::data.AllocOsDesc.descriptor`
   191  respectively as an application virtual address and call Linux's
   192  `pin_user_pages()` or `get_user_pages()` to get `struct page` pointers
   193  representing pages starting at that address[^cite-osdesc-rmapi]. Pins are held
   194  on those pages for the lifetime of the `OsDescMem` object.
   195  
   196  The proxy driver will need to replicate this behavior in the sentry, though
   197  doing so should not require major changes outside of the driver. When one of
   198  these ioctls is invoked by an application:
   199  
   200  -   Invoke `mmap` to create a temporary `PROT_NONE` mapping in the sentry's
   201      address space of the size passed by the application.
   202  
   203  -   Call `mm.MemoryManager.Pin()` to acquire file-page references on the given
   204      application memory.
   205  
   206  -   Call `memmap.File.MapInternal()` to get sentry mappings of pinned
   207      file-pages.
   208  
   209  -   Use `mremap(old_size=0, flags=MREMAP_FIXED)` to replicate mappings returned
   210      by `MapInternal()` into the temporary mapping, resulting in a
   211      virtually-contiguous sentry mapping of the application-specified address
   212      range.
   213  
   214  -   Invoke the host ioctl using the sentry mapping.
   215  
   216  -   `munmap` the temporary mapping, which is no longer required after the host
   217      ioctl.
   218  
   219  -   Hold the file-page references returned by `mm.MemoryManager.Pin()` until an
   220      application ioctl is observed freeing the corresponding `OsDescMem`, then
   221      call `mm.Unpin()`.
   222  
   223  ### Security Considerations
   224  
   225  Since ioctl parameter structs must be copied into the sentry in order to proxy
   226  them, gVisor implicitly restrict the set of application requests to those that
   227  are explicitly implemented. We can impose additional restrictions based on
   228  parameter values in order to further reduce attack surface, although possibly at
   229  the cost of reduced development velocity; introducing new restrictions after
   230  launch is difficult due to the risk of regressing existing users. Intuitively,
   231  limiting the scope of our support to GPU compute should allow us to narrow API
   232  usage to that of the CUDA runtime. [Nvidia GPU driver CVEs are published in
   233  moderately large batches every ~3-4
   234  months](https://www.nvidia.com/en-us/security/), but insufficient information
   235  regarding these CVEs is available for us to determine how many of these
   236  vulnerabilities we could mitigate via parameter filtering.
   237  
   238  By default, the driver prevents a `/dev/nvidiactl` FD from using objects created
   239  by other `/dev/nvidiactl` FDs[^cite-rm-validate], providing driver-level
   240  resource isolation between applications. Since we need to track at least a
   241  subset of object allocations for OS-described memory, and possibly for
   242  determining memory caching type, we can optionally track *all* objects and
   243  further constrain ioctls to using valid object handles if driver-level isolation
   244  is believed inadequate.
   245  
   246  While `seccomp-bpf` filters allow us to limit the set of ioctl requests that the
   247  sentry can make, they cannot filter based on ioctl parameters passed via memory
   248  such as allocation `hClass`, `NV_ESC_RM_CONTROL` command, or
   249  `NV_ESC_RM_VID_HEAP_CONTROL` function, limiting the extent to which they can
   250  protect the host from a compromised sentry.
   251  
   252  ### `runsc` Container Configuration
   253  
   254  The
   255  [Nvidia Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit)
   256  contains code to configure an unstarted container based on
   257  [the GPU support requested by its OCI runtime spec](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#environment-variables-oci-spec),
   258  [invoking `nvidia-container-cli` from `libnvidia-container` (described above) to
   259  do most of the actual
   260  work](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html).
   261  It is used ubiquitously for this purpose, including by the
   262  [Nvidia device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin).
   263  
   264  The simplest way for `runsc` to obtain Nvidia Container Toolkit's behavior is
   265  obviously to use it, either by invoking `nvidia-container-runtime-hook` or by
   266  using the Toolkit's code (which is written in Go) directly. However, filesystem
   267  modifications made to the container's `/dev` and `/proc` directories on the host
   268  will not be application-visible since `runsc` necessarily injects sentry
   269  `devtmpfs` and `procfs` mounts at these locations, requiring that `runsc`
   270  internally replicate the effects of `libnvidia-container` in these directories.
   271  Note that host filesystem modifications are still necessary, since the sentry
   272  itself needs access to relevant host device files and MIG capabilities.
   273  
   274  Conversely, we can attempt to emulate the behavior of `nvidia-container-toolkit`
   275  and `libnvidia-container` within `runsc`; however, note that
   276  `libnvidia-container` executes `ldconfig` to regenerate the container's runtime
   277  linker cache after mounting the driver's shared libraries into the
   278  container[^cite-nvc-ldcache_update], which is more difficult if said mounts
   279  exist within the sentry's VFS rather than on the host.
   280  
   281  ### Proprietary Driver Differences
   282  
   283  When running on the proprietary kernel driver, applications invoke
   284  `ioctl(NV_ESC_RM_CONTROL)` commands that do not appear to exist in the OSS
   285  driver. The OSS driver lacks support for GPU virtualization[^cite-oss-vgpu];
   286  however, Google Compute Engine (GCE) GPUs are exposed to VMs in passthrough
   287  mode[^cite-oss-gce], and Container-Optimized OS (COS) switched to the OSS driver
   288  in Milestone 105[^cite-oss-cos], suggesting that OSS-driver-only support may be
   289  sufficient. If support for the proprietary driver is required, we can request
   290  documentation from Nvidia.
   291  
   292  ### API/ABI Stability
   293  
   294  Nvidia requires that the kernel and userspace components of the driver match
   295  versions[^cite-abi-readme], and does not guarantee kernel ABI
   296  stability[^cite-abi-discuss], so we may need to support multiple ABI versions in
   297  the proxy. It is not immediately clear if this will be a problem in practice.
   298  
   299  ## Proposed Work
   300  
   301  To simplify the initial implementation, we will focus immediate efforts on
   302  process-based platforms and defer support for KVM-based platforms to future
   303  work.
   304  
   305  In the sentry:
   306  
   307  -   Add structure and constant definitions from the Nvidia open-source kernel
   308      driver to new package `//pkg/abi/nvidia`.
   309  
   310  -   Implement the proxy driver under `//pkg/sentry/devices/nvproxy`, initially
   311      comprising `FileDescriptionImpl` implementations proxying `/dev/nvidiactl`,
   312      `/dev/nvidia#`, and `/dev/nvidia-uvm`.
   313  
   314  -   `/proc/driver/nvidia/params` can probably be (optionally) read once during
   315      startup and implemented as a static file in the sentry.
   316  
   317  Each ioctl command and object class is associated with its own parameters type
   318  and logic; thus, each needs to be implemented individually. We can generate
   319  lists of required commands/classes by running representative applications under
   320  [`cuda_ioctl_sniffer`](https://github.com/geohot/cuda_ioctl_sniffer) on a
   321  variety of GPUs; a list derived from a minimal CUDA workload run on a single VM
   322  follows below. The proxy driver itself should also log unimplemented
   323  commands/classes for iterative development. For the most part, known-required
   324  commands/classes should be implementable incrementally and in parallel.
   325  
   326  Concurrently, at the API level, i.e. within `//runsc`:
   327  
   328  -   Add an option to enable Nvidia GPU support. When this option is enabled, and
   329      `runsc` detects that GPU support is requested by the container, it enables
   330      the proxy driver (by calling `nvproxy.Register(vfsObj)`) and configures the
   331      container consistently with `nvidia-container-toolkit` and
   332      `libnvidia-container`.
   333  
   334      Since setting the wrong caching behavior for device memory mappings will
   335      fail in unpredictable ways, `runsc` must ensure that GPU support cannot be
   336      enabled when an unsupported platform is selected.
   337  
   338  To support Nvidia Multi-Process Service (MPS), we need:
   339  
   340  -   Support for `SCM_CREDENTIALS` on host Unix domain sockets; already
   341      implemented as part of previous MPS investigation, but not merged.
   342  
   343  -   Optional pass-through of `statfs::f_type` through `fsimpl/gofer`; needed for
   344      a runsc bind mount of the host's `/dev/shm`, through which MPS shares
   345      memory; previously hacked in (optionality not implemented).
   346  
   347  Features required to support Nvidia Persistence Daemon and Nvidia Fabric Manager
   348  are currently unknown, but these are not believed to be critical, and we may
   349  choose to deliberately deny access to them (and/or MPS) to reduce attack
   350  surface.
   351  [MPS provides "memory protection" but not "error isolation"](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#cuda-concurrency),
   352  so it is not clear that granting MPS access to sandboxed containers is safe.
   353  
   354  Implementation notes:
   355  
   356  -   Each application `open` of `/dev/nvidictl`, `/dev/nvidia#`, or
   357      `/dev/nvidia-uvm` must be backed by a distinct host FD. Furthermore, the
   358      proxy driver cannot go through sentry VFS to obtain this FD since doing so
   359      would recursively attempt to open the proxy driver. Instead, we must allow
   360      the proxy driver to invoke host `openat`, and ensure that the mount
   361      namespace in which the sentry executes contains the required device special
   362      files.
   363  
   364  -   `/dev/nvidia-uvm` FDs may need to be `UVM_INITIALIZE`d with
   365      `UVM_INIT_FLAGS_MULTI_PROCESS_SHARING_MODE` to be used from both sentry and
   366      application processes[^cite-uvm-va_space_mm_enabled].
   367  
   368  -   Known-used `nvidia.ko` ioctls: `NV_ESC_CHECK_VERSION_STR`,
   369      `NV_ESC_SYS_PARAMS`, `NV_ESC_CARD_INFO`, `NV_ESC_NUMA_INFO`,
   370      `NV_ESC_REGISTER_FD`, `NV_ESC_RM_ALLOC`, `NV_ESC_RM_ALLOC_MEMORY`,
   371      `NV_ESC_RM_ALLOC_OS_EVENT`, `NV_ESC_RM_CONTROL`, `NV_ESC_RM_FREE`,
   372      `NV_ESC_RM_MAP_MEMORY`, `NV_ESC_RM_VID_HEAP_CONTROL`,
   373      `NV_ESC_RM_DUP_OBJECT`, `NV_ESC_RM_UPDATE_DEVICE_MAPPING_INFO`
   374  
   375  -   `NV_ESC_RM_CONTROL` is essentially another level of ioctls. Known-used
   376      `NVOS54_PARAMETERS::cmd`: `NV0000_CTRL_CMD_SYSTEM_GET_BUILD_VERSION`,
   377      `NV0000_CTRL_CMD_CLIENT_SET_INHERITED_SHARE_POLICY`,
   378      `NV0000_CTRL_CMD_SYSTEM_GET_FABRIC_STATUS`,
   379      `NV0000_CTRL_CMD_GPU_GET_PROBED_IDS`,
   380      `NV0000_CTRL_CMD_SYNC_GPU_BOOST_GROUP_INFO`,
   381      `NV0000_CTRL_CMD_GPU_ATTACH_IDS`, `NV0000_CTRL_CMD_GPU_GET_ID_INFO`,
   382      `NV0000_CTRL_CMD_GPU_GET_ATTACHED_IDS`,
   383      `NV2080_CTRL_CMD_GPU_GET_ACTIVE_PARTITION_IDS`,
   384      `NV2080_CTRL_CMD_GPU_GET_GID_INFO`,
   385      `NV0080_CTRL_CMD_GPU_GET_VIRTUALIZATION_MODE`,
   386      `NV2080_CTRL_CMD_FB_GET_INFO`, `NV2080_CTRL_CMD_GPU_GET_INFO`,
   387      `NV0080_CTRL_CMD_MC_GET_ARCH_INFO`, `NV2080_CTRL_CMD_BUS_GET_INFO`,
   388      `NV2080_CTRL_CMD_BUS_GET_PCI_INFO`, `NV2080_CTRL_CMD_BUS_GET_PCI_BAR_INFO`,
   389      `NV2080_CTRL_CMD_GPU_QUERY_ECC_STATUS`, `NV0080_CTRL_FIFO_GET_CAPS`,
   390      `NV0080_CTRL_CMD_GPU_GET_CLASSLIST`, `NV2080_CTRL_CMD_GPU_GET_ENGINES`,
   391      `NV2080_CTRL_CMD_GPU_GET_SIMULATION_INFO`,
   392      `NV0000_CTRL_CMD_GPU_GET_MEMOP_ENABLE`, `NV2080_CTRL_CMD_GR_GET_INFO`,
   393      `NV2080_CTRL_CMD_GR_GET_GPC_MASK`, `NV2080_CTRL_CMD_GR_GET_TPC_MASK`,
   394      `NV2080_CTRL_CMD_GR_GET_CAPS_V2`, `NV2080_CTRL_CMD_CE_GET_CAPS`,
   395      `NV2080_CTRL_CMD_GPU_GET_COMPUTE_POLICY_CONFIG`,
   396      `NV2080_CTRL_CMD_GR_GET_GLOBAL_SM_ORDER`, `NV0080_CTRL_CMD_FB_GET_CAPS`,
   397      `NV0000_CTRL_CMD_CLIENT_GET_ADDR_SPACE_TYPE`,
   398      `NV2080_CTRL_CMD_GSP_GET_FEATURES`,
   399      `NV2080_CTRL_CMD_GPU_GET_SHORT_NAME_STRING`,
   400      `NV2080_CTRL_CMD_GPU_GET_NAME_STRING`,
   401      `NV2080_CTRL_CMD_GPU_QUERY_COMPUTE_MODE_RULES`,
   402      `NV2080_CTRL_CMD_RC_RELEASE_WATCHDOG_REQUESTS`,
   403      `NV2080_CTRL_CMD_RC_SOFT_DISABLE_WATCHDOG`,
   404      `NV2080_CTRL_CMD_NVLINK_GET_NVLINK_STATUS`,
   405      `NV2080_CTRL_CMD_RC_GET_WATCHDOG_INFO`, `NV2080_CTRL_CMD_PERF_BOOST`,
   406      `NV0080_CTRL_CMD_FIFO_GET_CHANNELLIST`, `NVC36F_CTRL_GET_CLASS_ENGINEID`,
   407      `NVC36F_CTRL_CMD_GPFIFO_GET_WORK_SUBMIT_TOKEN`,
   408      `NV2080_CTRL_CMD_GR_GET_CTX_BUFFER_SIZE`, `NVA06F_CTRL_CMD_GPFIFO_SCHEDULE`
   409  
   410  -   Known-used `NVOS54_PARAMETERS::cmd` that are apparently unimplemented and
   411      may be proprietary-driver-only (or just well-hidden?): 0x20800159,
   412      0x20800161, 0x20801001, 0x20801009, 0x2080100a, 0x20802016, 0x20802084,
   413      0x503c0102, 0x90e60102
   414  
   415  -   Known-used `nvidia-uvm.ko` ioctls: `UVM_INITIALIZE`,
   416      `UVM_PAGEABLE_MEM_ACCESS`, `UVM_REGISTER_GPU`, `UVM_CREATE_RANGE_GROUP`,
   417      `UVM_REGISTER_GPU_VASPACE`, `UVM_CREATE_EXTERNAL_RANGE`,
   418      `UVM_MAP_EXTERNAL_ALLOCATION`, `UVM_REGISTER_CHANNEL`,
   419      `UVM_ALLOC_SEMAPHORE_POOL`, `UVM_VALIDATE_VA_RANGE`
   420  
   421  -   Known-used `NV_ESC_RM_ALLOC` `hClass`, i.e. allocated object classes:
   422      `NV01_ROOT_CLIENT`, `MPS_COMPUTE`, `NV01_DEVICE_0`, `NV20_SUBDEVICE_0`,
   423      `TURING_USERMODE_A`, `FERMI_VASPACE_A`, `NV50_THIRD_PARTY_P2P`,
   424      `FERMI_CONTEXT_SHARE_A`, `TURING_CHANNEL_GPFIFO_A`, `TURING_COMPUTE_A`,
   425      `TURING_DMA_COPY_A`, `NV01_EVENT_OS_EVENT`, `KEPLER_CHANNEL_GROUP_A`
   426  
   427  ## References
   428  
   429  [^cite-abi-discuss]: "[The] RMAPI currently does not have any ABI stability
   430      guarantees whatsoever, and even API compatibility breaks
   431      occasionally." -
   432      https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-2757388
   433  [^cite-abi-readme]: "This is the source release of the NVIDIA Linux open GPU
   434      kernel modules, version 530.41.03. ... Note that the kernel
   435      modules built here must be used with GSP firmware and
   436      user-space NVIDIA GPU driver components from a corresponding
   437      530.41.03 driver release." -
   438      https://github.com/NVIDIA/open-gpu-kernel-modules/blob/6dd092ddb7c165fb1ec48b937fa6b33daa37f9c1/README.md
   439  [^cite-nvc-ldcache_update]: [`src/nvc_ldcache.c:nvc_ldcache_update()`](https://github.com/NVIDIA/libnvidia-container/blob/eb0415c458c5e5d97cb8ac08b42803d075ed73cd/src/nvc_ldcache.c#L355)
   440  [^cite-osdesc-rmapi]: [`src/nvidia/arch/nvalloc/unix/src/escape.c:RmCreateOsDescriptor()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/arch/nvalloc/unix/src/escape.c#L120)
   441      =>
   442      [`kernel-open/nvidia/os-mlock.c:os_lock_user_pages()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia/os-mlock.c#L214)
   443  [^cite-oss-cos]: "Upgraded Nvidia latest drivers from v510.108.03 to v525.60.13
   444      (OSS)." -
   445      https://cloud.google.com/container-optimized-os/docs/release-notes/m105#cos-beta-105-17412-1-2_vs_milestone_101_.
   446      Also see b/235364591, go/cos-oss-gpu.
   447  [^cite-oss-gce]: "Compute Engine provides NVIDIA GPUs for your VMs in
   448      passthrough mode so that your VMs have direct control over the
   449      GPUs and their associated memory." -
   450      https://cloud.google.com/compute/docs/gpus
   451  [^cite-oss-vgpu]: "The currently published driver does not support
   452      virtualization, neither as a host nor a guest." -
   453      https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-2752052
   454  [^cite-pytorch-uvm]: [`c10/cuda/CUDADeviceAssertionHost.cpp:c10::cuda::CUDAKernelLaunchRegistry::get_uvm_assertions_ptr_for_current_device()`](https://github.com/pytorch/pytorch/blob/3f5d768b561e3edd17e93fd4daa7248f9d600bb2/c10/cuda/CUDADeviceAssertionHost.cpp#L268)
   455  [^cite-rm-validate]: See calls to `clientValidate()` in
   456      [`src/nvidia/src/libraries/resserv/src/rs_server.c`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/6dd092ddb7c165fb1ec48b937fa6b33daa37f9c1/src/nvidia/src/libraries/resserv/src/rs_server.c)
   457      =>
   458      [`src/nvidia/src/kernel/rmapi/client.c:rmclientValidate_IMPL()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/src/kernel/rmapi/client.c#L728).
   459      `API_SECURITY_INFO::clientOSInfo` is set by
   460      [`src/nvidia/arch/nvalloc/unix/src/escape.c:RmIoctl()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/arch/nvalloc/unix/src/escape.c#L300).
   461      Both `PDB_PROP_SYS_VALIDATE_CLIENT_HANDLE` and
   462      `PDB_PROP_SYS_VALIDATE_CLIENT_HANDLE_STRICT` are enabled by
   463      default by
   464      [`src/nvidia/generated/g_system_nvoc.c:__nvoc_init_dataField_OBJSYS()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/4397463e738d2d90aa1164cc5948e723701f7b53/src/nvidia/generated/g_system_nvoc.c#L84).
   465  [^cite-sdm-pat]: Intel SDM Vol. 3, Sec. 12.12 "Page Attribute Table (PAT)"
   466  [^cite-uvm-mmap]: [`kernel-open/nvidia-uvm/uvm.c:uvm_mmap()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia-uvm/uvm.c#L557)
   467  [^cite-uvm-va_space_mm_enabled]: [`kernel-open/nvidia-uvm/uvm_va_space_mm.c:uvm_va_space_mm_enabled()`](https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/kernel-open/nvidia-uvm/uvm_va_space_mm.c#L188)