github.com/openshift/installer@v1.4.17/docs/user/openstack/vgpu.md

github.com/openshift/installer@v1.4.17/docs/user/openstack/vgpu.md (about)

     1  # Installing a cluster on OpenStack with vGPU support
     2  
     3  If the underlying OpenStack deployment have proper GPU hardware installed and
     4  configured there is a way to pass down vGPU to the pods by using gpu-operator.
     5  
     6  
     7  ## Pre-requisites
     8  
     9  The following steps are required to be checked before starting the deployment of
    10  OpenShift.
    11  
    12  - Appropriate hardware is installed (like [NVIDIA Tesla
    13    V100](https://www.nvidia.com/en-gb/data-center/tesla-v100)) on
    14    the OpenStack compute node
    15  - NVIDIA host drivers installed and nouveau driver removed
    16  - Compute service installed on it and properly configured
    17  
    18  
    19  ## Driver installation
    20  
    21  All of the examples assume RHEL8.4 and OSP 16.2 are used.
    22  
    23  Given, there is NVIDIA vGPU capable card installed on the machine which intended
    24  to have compute role, which may be confirmed by using a command which should
    25  display similar output:
    26  
    27  ```console
    28  $ lspci -nn | grep -i nvidia
    29  3b:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB]
    30  [10de:1db4] (rev a1)
    31  ```
    32  make sure to remove `nouveau` driver from loading. It might be necessary to add
    33  it to `/etc/modprobe.d/blacklist.conf` and/or change grub config:
    34  
    35  ```console
    36  $ sudo sed -i 's/console=/rd.driver.blacklist=nouveau console=/' /etc/default/grub
    37  $ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
    38  ```
    39  
    40  After that install host vGPU NVIDIA drivers (which are available to download for
    41  license purchasers on
    42  [NVIDIA application hub](https://nvid.nvidia.com/dashboard/)):
    43  
    44  ```console
    45  $ sudo rpm -iv NVIDIA-vGPU-rhel-8.4-510.73.06.x86_64.rpm
    46  ```
    47  
    48  Note, that drivers version may differ. Be careful to get right RHEL version and
    49  architecture of the drivers to match installed RHEL.
    50  
    51  Reboot the machine. After reboot, confirm there are correct drivers used:
    52  
    53  ```console
    54  $ lsmod | grep nvidia
    55  nvidia_vgpu_vfio       57344  0
    56  nvidia              39055360  11
    57  mdev                   20480  2 vfio_mdev,nvidia_vgpu_vfio
    58  vfio                   36864  3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
    59  drm                   569344  4 drm_kms_helper,nvidia,mgag200
    60  ```
    61  
    62  You can also use `nvidia-smi` tool for displaying device state.
    63  
    64  
    65  ## OpenStack compute node
    66  
    67  There should be mediated devices populated by the driver (bus address may vary):
    68  
    69  ```console
    70  $ ls /sys/class/mdev_bus/0000\:3b\:00.0/mdev_supported_types/
    71  nvidia-105  nvidia-106  nvidia-107  nvidia-108  nvidia-109  nvidia-110
    72  nvidia-111  nvidia-112  nvidia-113  nvidia-114  nvidia-115  nvidia-163
    73  nvidia-217  nvidia-247  nvidia-299  nvidia-300  nvidia-301
    74  ```
    75  
    76  Depending of the type of workload and purchased license edition, appropriate
    77  types needs to be configured in `nova.conf` for compute node, i.e.:
    78  
    79  ```ini
    80  ...
    81  [devices]
    82  enabled_vgpu_types: nvidia-105
    83  
    84  ...
    85  ```
    86  
    87  After compute service restart, placement-api should report additional resources -
    88  command `openstack resource provider list` and `openstack resource provider inventory list <id of the main provider>`
    89  should display VGPU resource class available. For more information
    90  [navigate to OpenStack Nova docs](https://docs.openstack.org/nova/train/admin/virtual-gpu.html).
    91  
    92  
    93  ## OpenStack vGPU flavor
    94  
    95  Now, create a flavor, to be used to spin up new vGPU enabled nodes:
    96  
    97  ```console
    98  $ openstack flavor create --disk 25 --ram 8192 --vcpus 4 \
    99      --property "resources:VGPU=1" --public <nova_gpu_flavor>
   100  ```
   101  
   102  
   103  ## Create vGPU enabled Worker Nodes
   104  
   105  Worker nodes can be created by using machine API. To do that,
   106  [create new machineSet in OpenShift](https://docs.openshift.com/container-platform/4.11/machine_management/creating_machinesets/creating-machineset-osp.html).
   107  
   108  ```console
   109  $ oc get machineset -n openshift-machine-api <machineset_name> -o yaml > vgpu_machineset.yaml
   110  ```
   111  
   112  Edit yaml file, be sure to have different name, have replicas set to the amount
   113  of your cGPU capacity at maximum and set the right flavor, which would hint
   114  OpenStack about right resources to include into virtual machine (Note, that this
   115  is just an example, yours might be different):
   116  
   117  ```yaml
   118  apiVersion: machine.openshift.io/v1beta1
   119  kind: MachineSet
   120  metadata:
   121    annotations:
   122      machine.openshift.io/memoryMb: "8192"
   123      machine.openshift.io/vCPU: "4"
   124    labels:
   125      machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
   126      machine.openshift.io/cluster-api-machine-role: <node_role>
   127      machine.openshift.io/cluster-api-machine-type: <node_role>
   128    name: <infrastructure_ID>-<node_role>-gpu-0
   129    namespace: openshift-machine-api
   130  spec:
   131    replicas: <amount_of_nodes_with_gpu>
   132    selector:
   133      matchLabels:
   134        machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
   135        machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>-gpu-0
   136    template:
   137      metadata:
   138        labels:
   139          machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
   140          machine.openshift.io/cluster-api-machine-role: <node_role>
   141          machine.openshift.io/cluster-api-machine-type: <node_role>
   142          machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>-gpu-0
   143      spec:
   144        lifecycleHooks: {}
   145        metadata: {}
   146        providerSpec:
   147          value:
   148            apiVersion: openstackproviderconfig.openshift.io/v1alpha1
   149            cloudName: openstack
   150            cloudsSecret:
   151              name: openstack-cloud-credentials
   152              namespace: openshift-machine-api
   153            flavor: <nova_gpu_flavor>
   154            image: <glance_image_name_or_location>
   155            kind: OpenstackProviderSpec
   156            metadata:
   157              creationTimestamp: null
   158            networks:
   159            - filter: {}
   160              subnets:
   161              - filter:
   162                  name: <infrastructure_ID>-nodes
   163                  tags: openshiftClusterID=<infrastructure_ID>
   164            securityGroups:
   165            - filter: {}
   166              name: <infrastructure_ID>-<node_role>
   167            serverGroupName: <infrastructure_ID>-<node_role>
   168            serverMetadata:
   169              Name: <infrastructure_ID>-<node_role>
   170              openshiftClusterID: <infrastructure_ID>
   171            tags:
   172            - openshiftClusterID=<infrastructure_ID>
   173            trunk: true
   174            userDataSecret:
   175              name: <node_role>-user-data
   176  ```
   177  
   178  Save the file, and create machineset:
   179  
   180  ```console
   181  $ oc create -f vgpu_machineset.yaml
   182  ```
   183  
   184  And wait for new node to show up. You can examine its presence and state using
   185  `openstack server list` and after VM is ready `oc get nodes`. New node should be
   186  available with status "Ready".
   187  
   188  
   189  ## Discover features and enable GPU
   190  
   191  Now it's time to install two operators:
   192  
   193  - [Node Feature Discovery](https://docs.openshift.com/container-platform/4.11/hardware_enablement/psap-node-feature-discovery-operator.html)
   194  - [Gpu Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/contents.html)
   195  
   196  ### Node Feature Discovery Operator
   197  
   198  This operator is needed for labeling nodes with detected hardware features. It
   199  is required by the gpu operator. To install it, follow
   200  [the documentation for nfd operator](https://docs.openshift.com/container-platform/4.11/hardware_enablement/psap-node-feature-discovery-operator.html)
   201  
   202  To include NVIDIA card(s) in the NodeFeatureDiscovery instance, following
   203  changes has been made:
   204  
   205  ```yaml
   206  apiVersion: nfd.kubernetes.io/v1
   207  kind: NodeFeatureDiscovery
   208  metadata:
   209    name: nfd-instance
   210    namespace: node-feature-discovery-operator
   211  spec:
   212    instance: ""
   213    topologyupdater: false
   214    operand:
   215      image: registry.redhat.io/openshift4/ose-node-feature-discovery:v<ocp_version>
   216      imagePullPolicy: Always
   217    workerConfig:
   218      configData: |
   219        sources:
   220          pci:
   221            deviceClassWhitelist:
   222              - "10de"
   223            deviceLabelFields:
   224              - vendor
   225  ```
   226  
   227  Be sure to replace `<ocp_version>` with correct OCP version.
   228  
   229  
   230  ### GPU Operator
   231  
   232  Follow documentation for it on [NVIDIA
   233  site](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/install-gpu-ocp.html#installing-the-nvidia-gpu-operator-using-the-cli),
   234  which basically take down to following steps:
   235  
   236  1. Create namespace and group (save to file an do the `oc create -f filename`):
   237     ```yaml
   238     ---
   239     apiVersion: v1
   240     kind: Namespace
   241     metadata:
   242       name: nvidia-gpu-operator
   243     ---
   244     apiVersion: operators.coreos.com/v1
   245     kind: OperatorGroup
   246     metadata:
   247       name: nvidia-gpu-operator-group
   248       namespace: nvidia-gpu-operator
   249     spec:
   250       targetNamespaces:
   251       - nvidia-gpu-operator
   252     ```
   253  1. Get the proper channel for gpu-operator:
   254     ```console
   255     $ CH=$(oc get packagemanifest gpu-operator-certified \
   256         -n openshift-marketplace -o jsonpath='{.status.defaultChannel}')
   257     $ echo $CH
   258     v22.9
   259     ```
   260  1. Get right name for the gpu-operator:
   261     ```console
   262     $ GPU_OP_NAME=$(oc get packagemanifests/gpu-operator-certified \
   263         -n openshift-marketplace -o json | jq \
   264         -r '.status.channels[]|select(.name == "'${CH}'")|.currentCSV')
   265     $ echo $GPU_OP_NAME
   266     gpu-operator-certified.v22.9.0
   267     ```
   268  1. Now, create nvidia-sub.yaml with subscription with the values, which was
   269     earlier fetched (save to file an do the `oc create -f filename`):
   270     ```yaml
   271     apiVersion: operators.coreos.com/v1alpha1
   272     kind: Subscription
   273     metadata:
   274       name: gpu-operator-certified
   275       namespace: nvidia-gpu-operator
   276     spec:
   277       channel: "<channel>"
   278       installPlanApproval: Manual
   279       name: gpu-operator-certified
   280       source: certified-operators
   281       sourceNamespace: openshift-marketplace
   282       startingCSV: "<gpu_operator_name>"
   283     ```
   284  1. Verify if installplan has been created.
   285     ```console
   286     $ oc get installplan -n nvidia-gpu-operator
   287     ```
   288     In column APPROVED you will see `false`
   289  1. Approve the plan:
   290     ```console
   291     $ oc patch installplan.operators.coreos.com/<install_plan_name> \
   292         -n nvidia-gpu-operator --type merge \
   293         --patch '{"spec":{"approved":true }}'
   294     ```
   295  
   296  Now, it is needed to build an image which will be used by gpu-operator for
   297  building drivers on the cluster.
   298  
   299  Download needed drivers from the [NVIDIA application
   300  hub](https://nvid.nvidia.com/dashboard/), along with vgpuDriverCatalog.yaml
   301  file. The only files needed for vGPU are (at the time of
   302  writing):
   303  
   304  - NVIDIA-Linux-x86_64-510.85.02-grid.run
   305  - vgpuDriverCatalog.yaml
   306  - gridd.conf
   307  
   308  Note, that drivers which should be used are the **guest** ones, not the host,
   309  which was installed on the OpenStack compute node.
   310  
   311  Clone the driver repository and copy all of needed drivers to the
   312  driver/rehel8/drivers directory:
   313  
   314  ```console
   315  $ git clone https://gitlab.com/nvidia/container-images/driver
   316  $ cd driver rhel8
   317  $ cp /path/to/obtained/drivers/* drivers/
   318  ```
   319  
   320  Create gridd.conf file and copy it to `drivers` (installation of licensing
   321  server is out of scope for this document):
   322  ```
   323  # Description: Set License Server Address
   324  # Data type: string
   325  # Format:  "<address>"
   326  ServerAddress=<licensing_server_address>
   327  ```
   328  
   329  Go to the driver/rhel8/ path, and prepare image:
   330  ```console
   331  $ export PRIVATE_REGISTRY=<registry_name/path>
   332  $ export OS_TAG=<ocp_tag>
   333  $ export VERSION=<version>
   334  $ export VGPU_DRIVER_VERSION=<vgpu_version>
   335  $ export CUDA_VERSION=<cuda_version>
   336  $ export TARGETARCH=<architecture>
   337  $ podman build \
   338      --build-arg CUDA_VERSION=${CUDA_VERSION} \
   339      --build-arg DRIVER_TYPE=vgpu \
   340      --build-arg TARGETARCH=$TARGETARCH \
   341      --build-arg DRIVER_VERSION=$VGPU_DRIVER_VERSION \
   342      -t ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG} .
   343  ```
   344  
   345  where:
   346  
   347  - `PRIVATE_REGISTRY` is a name for private registry where image will be pushed
   348    to/pulled from, i.e. "quay.io/someuser"
   349  - `OS_TAG` is a proper string matching RHCOS version used for cluster
   350    installation, i.e. "rhcos4.12"
   351  - `VERSION` may be any string or number, i.e. "1.0.0"
   352  - `VGPU_DRIVER_VERSION` is a substring from drivers. I.e. if there is file for
   353    building driver like "NVIDIA-Linux-x86_64-510.85.02-grid.run", then the
   354    version will be "510.85.02-grid".
   355  - `CUDA_VERSION` is the latest supported version of CUDA supported on that
   356    particular GPU (or any other needed), i.e. "11.7.1".
   357  - `TARGETARCH` is the target architecture which cluster runs on (usually
   358    "x86_64")
   359  
   360  
   361  Push image to the registry:
   362  ```console
   363  $ podman push ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG}
   364  ```
   365  
   366  Create license server configmap:
   367  ```console
   368  $ oc create configmap licensing-config \
   369      -n nvidia-gpu-operator --from-file=drivers/gridd.conf
   370  ```
   371  
   372  Create secret for connecting to the registry:
   373  ```console
   374  $ oc -n nvidia-gpu-operator \
   375      create secret docker-registry my-registry \
   376      --docker-server=${PRIVATE_REGISTRY} \
   377      --docker-username=<username> \
   378      --docker-password=<pass> \
   379      --docker-email=<e-mail>
   380  ```
   381  
   382  Substitute `<username>` `<pass>` and `<e-mail>` with real data. Here,
   383  `my-registry` is used as the name of the secret and also could be changed (it
   384  corresponds with `imagePullSectrets` array in `clusterpolicy` later on).
   385  
   386  Get the clusterpolicy:
   387  ```console
   388  $ oc get csv -n nvidia-gpu-operator $GPU_OP_NAME \
   389      -o jsonpath={.metadata.annotations.alm-examples} | \
   390      jq .[0] > clusterpolicy.json
   391  ```
   392  
   393  Edit it and add marked in fields:
   394  
   395  ```json
   396  {
   397    ...
   398    "spec": {
   399      ...
   400      "driver": {
   401         ...
   402       "repository": "<registry_name/path>",
   403       "image": "driver",
   404       "imagePullSecrets": ["my-registry"],
   405       "licensingConfig": {
   406          "configMapName": "licensing-config",
   407          "nlsEnabled": true
   408       },
   409       "version": "<version>",
   410       ...
   411      }
   412      ...
   413    }
   414  }
   415  ```
   416  
   417  Apply changes:
   418  ```console
   419  $ oc apply -f clusterpolicy.json
   420  ```
   421  
   422  Wait for drivers to be built. It may take a while. State of the pods should be
   423  either running or completed.
   424  ```console
   425  $ oc get pods -n nvidia-gpu-operator
   426  ```
   427  
   428  ## Run sample app
   429  
   430  To verify installation, create simple app (app.yaml):
   431  ```yaml
   432  apiVersion: v1
   433  kind: Pod
   434  metadata:
   435    name: cuda-vectoradd
   436  spec:
   437   restartPolicy: OnFailure
   438   containers:
   439   - name: cuda-vectoradd
   440     image: "nvidia/samples:vectoradd-cuda11.2.1"
   441     resources:
   442       limits:
   443         nvidia.com/gpu: 1
   444  ```
   445  
   446  Run it:
   447  ```console
   448  $ oc apply -f app.yaml
   449  ```
   450  
   451  Check the logs after pod finish its job:
   452  ```console
   453  $ oc logs cuda-vectoradd
   454  [Vector addition of 50000 elements]
   455  Copy input data from the host memory to the CUDA device
   456  CUDA kernel launch with 196 blocks of 256 threads
   457  Copy output data from the CUDA device to the host memory
   458  Test PASSED
   459  Done
   460  ```