github.com/containerd/nerdctl@v1.7.7/docs/gpu.md (about)

     1  # Using GPUs inside containers
     2  
     3  | :zap: Requirement | nerdctl >= 0.9 |
     4  |-------------------|----------------|
     5  
     6  nerdctl provides docker-compatible NVIDIA GPU support.
     7  
     8  ## Prerequisites
     9  
    10  - NVIDIA Drivers
    11    - Same requirement as when you use GPUs on Docker. For details, please refer to [the doc by NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites).
    12  - `nvidia-container-cli`
    13    - containerd relies on this CLI for setting up GPUs inside container. You can install this via [`libnvidia-container` package](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#libnvidia-container).
    14  
    15  ## Options for `nerdctl run --gpus`
    16  
    17  `nerdctl run --gpus` is compatible to [`docker run --gpus`](https://docs.docker.com/engine/reference/commandline/run/#access-an-nvidia-gpu).
    18  
    19  You can specify number of GPUs to use via `--gpus` option.
    20  The following example exposes all available GPUs.
    21  
    22  ```
    23  nerdctl run -it --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
    24  ```
    25  
    26  You can also pass detailed configuration to `--gpus` option as a list of key-value pairs. The following options are provided.
    27  
    28  - `count`: number of GPUs to use. `all` exposes all available GPUs.
    29  - `device`: IDs of GPUs to use. UUID or numbers of GPUs can be specified.
    30  - `capabilities`: [Driver capabilities](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities). If unset, use default driver `utility`, `compute`.
    31  
    32  The following example exposes a specific GPU to the container.
    33  
    34  ```
    35  nerdctl run -it --rm --gpus '"capabilities=utility,compute",device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:9.0-base nvidia-smi
    36  ```
    37  
    38  ## Fields for `nerdctl compose`
    39  
    40  `nerdctl compose` also supports GPUs following [compose-spec](https://github.com/compose-spec/compose-spec/blob/master/deploy.md#devices).
    41  
    42  You can use GPUs on compose when you specify some of the following `capabilities` in `services.demo.deploy.resources.reservations.devices`.
    43  
    44  - `gpu`
    45  - `nvidia`
    46  - all allowed capabilities for `nerdctl run --gpus`
    47  
    48  Available fields are the same as `nerdctl run --gpus`.
    49  
    50  The following exposes all available GPUs to the container.
    51  
    52  ```
    53  version: "3.8"
    54  services:
    55    demo:
    56      image: nvidia/cuda:9.0-base
    57      command: nvidia-smi
    58      deploy:
    59        resources:
    60          reservations:
    61            devices:
    62            - capabilities: ["utility"]
    63              count: all
    64  ```
    65  
    66  ## Trouble Shooting
    67  
    68  ### `nerdctl run --gpus` fails when using the Nvidia gpu-operator
    69  
    70  If the Nvidia driver is installed by the [gpu-operator](https://github.com/NVIDIA/gpu-operator).The `nerdctl run` will fail with the error message `(FATA[0000] exec: "nvidia-container-cli": executable file not found in $PATH)`.
    71  
    72  So, the `nvidia-container-cli` needs to be added to the PATH environment variable.
    73  
    74  You can do this by adding the following line to your $HOME/.profile or /etc/profile (for a system-wide installation):
    75  ```
    76  export PATH=$PATH:/usr/local/nvidia/toolkit
    77  ```
    78  
    79  The shared libraries also need to be added to the system.
    80  ```
    81  echo "/run/nvidia/driver/usr/lib/x86_64-linux-gnu" > /etc/ld.so.conf.d/nvidia.conf
    82  ldconfig
    83  ```
    84  
    85  And then, the `nerdctl run --gpus` can run successfully.