github.com/openshift/installer@v1.4.17/docs/user/openstack/vgpu.md (about) 1 # Installing a cluster on OpenStack with vGPU support 2 3 If the underlying OpenStack deployment have proper GPU hardware installed and 4 configured there is a way to pass down vGPU to the pods by using gpu-operator. 5 6 7 ## Pre-requisites 8 9 The following steps are required to be checked before starting the deployment of 10 OpenShift. 11 12 - Appropriate hardware is installed (like [NVIDIA Tesla 13 V100](https://www.nvidia.com/en-gb/data-center/tesla-v100)) on 14 the OpenStack compute node 15 - NVIDIA host drivers installed and nouveau driver removed 16 - Compute service installed on it and properly configured 17 18 19 ## Driver installation 20 21 All of the examples assume RHEL8.4 and OSP 16.2 are used. 22 23 Given, there is NVIDIA vGPU capable card installed on the machine which intended 24 to have compute role, which may be confirmed by using a command which should 25 display similar output: 26 27 ```console 28 $ lspci -nn | grep -i nvidia 29 3b:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] 30 [10de:1db4] (rev a1) 31 ``` 32 make sure to remove `nouveau` driver from loading. It might be necessary to add 33 it to `/etc/modprobe.d/blacklist.conf` and/or change grub config: 34 35 ```console 36 $ sudo sed -i 's/console=/rd.driver.blacklist=nouveau console=/' /etc/default/grub 37 $ sudo grub2-mkconfig -o /boot/grub2/grub.cfg 38 ``` 39 40 After that install host vGPU NVIDIA drivers (which are available to download for 41 license purchasers on 42 [NVIDIA application hub](https://nvid.nvidia.com/dashboard/)): 43 44 ```console 45 $ sudo rpm -iv NVIDIA-vGPU-rhel-8.4-510.73.06.x86_64.rpm 46 ``` 47 48 Note, that drivers version may differ. Be careful to get right RHEL version and 49 architecture of the drivers to match installed RHEL. 50 51 Reboot the machine. After reboot, confirm there are correct drivers used: 52 53 ```console 54 $ lsmod | grep nvidia 55 nvidia_vgpu_vfio 57344 0 56 nvidia 39055360 11 57 mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio 58 vfio 36864 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 59 drm 569344 4 drm_kms_helper,nvidia,mgag200 60 ``` 61 62 You can also use `nvidia-smi` tool for displaying device state. 63 64 65 ## OpenStack compute node 66 67 There should be mediated devices populated by the driver (bus address may vary): 68 69 ```console 70 $ ls /sys/class/mdev_bus/0000\:3b\:00.0/mdev_supported_types/ 71 nvidia-105 nvidia-106 nvidia-107 nvidia-108 nvidia-109 nvidia-110 72 nvidia-111 nvidia-112 nvidia-113 nvidia-114 nvidia-115 nvidia-163 73 nvidia-217 nvidia-247 nvidia-299 nvidia-300 nvidia-301 74 ``` 75 76 Depending of the type of workload and purchased license edition, appropriate 77 types needs to be configured in `nova.conf` for compute node, i.e.: 78 79 ```ini 80 ... 81 [devices] 82 enabled_vgpu_types: nvidia-105 83 84 ... 85 ``` 86 87 After compute service restart, placement-api should report additional resources - 88 command `openstack resource provider list` and `openstack resource provider inventory list <id of the main provider>` 89 should display VGPU resource class available. For more information 90 [navigate to OpenStack Nova docs](https://docs.openstack.org/nova/train/admin/virtual-gpu.html). 91 92 93 ## OpenStack vGPU flavor 94 95 Now, create a flavor, to be used to spin up new vGPU enabled nodes: 96 97 ```console 98 $ openstack flavor create --disk 25 --ram 8192 --vcpus 4 \ 99 --property "resources:VGPU=1" --public <nova_gpu_flavor> 100 ``` 101 102 103 ## Create vGPU enabled Worker Nodes 104 105 Worker nodes can be created by using machine API. To do that, 106 [create new machineSet in OpenShift](https://docs.openshift.com/container-platform/4.11/machine_management/creating_machinesets/creating-machineset-osp.html). 107 108 ```console 109 $ oc get machineset -n openshift-machine-api <machineset_name> -o yaml > vgpu_machineset.yaml 110 ``` 111 112 Edit yaml file, be sure to have different name, have replicas set to the amount 113 of your cGPU capacity at maximum and set the right flavor, which would hint 114 OpenStack about right resources to include into virtual machine (Note, that this 115 is just an example, yours might be different): 116 117 ```yaml 118 apiVersion: machine.openshift.io/v1beta1 119 kind: MachineSet 120 metadata: 121 annotations: 122 machine.openshift.io/memoryMb: "8192" 123 machine.openshift.io/vCPU: "4" 124 labels: 125 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 126 machine.openshift.io/cluster-api-machine-role: <node_role> 127 machine.openshift.io/cluster-api-machine-type: <node_role> 128 name: <infrastructure_ID>-<node_role>-gpu-0 129 namespace: openshift-machine-api 130 spec: 131 replicas: <amount_of_nodes_with_gpu> 132 selector: 133 matchLabels: 134 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 135 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>-gpu-0 136 template: 137 metadata: 138 labels: 139 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 140 machine.openshift.io/cluster-api-machine-role: <node_role> 141 machine.openshift.io/cluster-api-machine-type: <node_role> 142 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>-gpu-0 143 spec: 144 lifecycleHooks: {} 145 metadata: {} 146 providerSpec: 147 value: 148 apiVersion: openstackproviderconfig.openshift.io/v1alpha1 149 cloudName: openstack 150 cloudsSecret: 151 name: openstack-cloud-credentials 152 namespace: openshift-machine-api 153 flavor: <nova_gpu_flavor> 154 image: <glance_image_name_or_location> 155 kind: OpenstackProviderSpec 156 metadata: 157 creationTimestamp: null 158 networks: 159 - filter: {} 160 subnets: 161 - filter: 162 name: <infrastructure_ID>-nodes 163 tags: openshiftClusterID=<infrastructure_ID> 164 securityGroups: 165 - filter: {} 166 name: <infrastructure_ID>-<node_role> 167 serverGroupName: <infrastructure_ID>-<node_role> 168 serverMetadata: 169 Name: <infrastructure_ID>-<node_role> 170 openshiftClusterID: <infrastructure_ID> 171 tags: 172 - openshiftClusterID=<infrastructure_ID> 173 trunk: true 174 userDataSecret: 175 name: <node_role>-user-data 176 ``` 177 178 Save the file, and create machineset: 179 180 ```console 181 $ oc create -f vgpu_machineset.yaml 182 ``` 183 184 And wait for new node to show up. You can examine its presence and state using 185 `openstack server list` and after VM is ready `oc get nodes`. New node should be 186 available with status "Ready". 187 188 189 ## Discover features and enable GPU 190 191 Now it's time to install two operators: 192 193 - [Node Feature Discovery](https://docs.openshift.com/container-platform/4.11/hardware_enablement/psap-node-feature-discovery-operator.html) 194 - [Gpu Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/contents.html) 195 196 ### Node Feature Discovery Operator 197 198 This operator is needed for labeling nodes with detected hardware features. It 199 is required by the gpu operator. To install it, follow 200 [the documentation for nfd operator](https://docs.openshift.com/container-platform/4.11/hardware_enablement/psap-node-feature-discovery-operator.html) 201 202 To include NVIDIA card(s) in the NodeFeatureDiscovery instance, following 203 changes has been made: 204 205 ```yaml 206 apiVersion: nfd.kubernetes.io/v1 207 kind: NodeFeatureDiscovery 208 metadata: 209 name: nfd-instance 210 namespace: node-feature-discovery-operator 211 spec: 212 instance: "" 213 topologyupdater: false 214 operand: 215 image: registry.redhat.io/openshift4/ose-node-feature-discovery:v<ocp_version> 216 imagePullPolicy: Always 217 workerConfig: 218 configData: | 219 sources: 220 pci: 221 deviceClassWhitelist: 222 - "10de" 223 deviceLabelFields: 224 - vendor 225 ``` 226 227 Be sure to replace `<ocp_version>` with correct OCP version. 228 229 230 ### GPU Operator 231 232 Follow documentation for it on [NVIDIA 233 site](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/install-gpu-ocp.html#installing-the-nvidia-gpu-operator-using-the-cli), 234 which basically take down to following steps: 235 236 1. Create namespace and group (save to file an do the `oc create -f filename`): 237 ```yaml 238 --- 239 apiVersion: v1 240 kind: Namespace 241 metadata: 242 name: nvidia-gpu-operator 243 --- 244 apiVersion: operators.coreos.com/v1 245 kind: OperatorGroup 246 metadata: 247 name: nvidia-gpu-operator-group 248 namespace: nvidia-gpu-operator 249 spec: 250 targetNamespaces: 251 - nvidia-gpu-operator 252 ``` 253 1. Get the proper channel for gpu-operator: 254 ```console 255 $ CH=$(oc get packagemanifest gpu-operator-certified \ 256 -n openshift-marketplace -o jsonpath='{.status.defaultChannel}') 257 $ echo $CH 258 v22.9 259 ``` 260 1. Get right name for the gpu-operator: 261 ```console 262 $ GPU_OP_NAME=$(oc get packagemanifests/gpu-operator-certified \ 263 -n openshift-marketplace -o json | jq \ 264 -r '.status.channels[]|select(.name == "'${CH}'")|.currentCSV') 265 $ echo $GPU_OP_NAME 266 gpu-operator-certified.v22.9.0 267 ``` 268 1. Now, create nvidia-sub.yaml with subscription with the values, which was 269 earlier fetched (save to file an do the `oc create -f filename`): 270 ```yaml 271 apiVersion: operators.coreos.com/v1alpha1 272 kind: Subscription 273 metadata: 274 name: gpu-operator-certified 275 namespace: nvidia-gpu-operator 276 spec: 277 channel: "<channel>" 278 installPlanApproval: Manual 279 name: gpu-operator-certified 280 source: certified-operators 281 sourceNamespace: openshift-marketplace 282 startingCSV: "<gpu_operator_name>" 283 ``` 284 1. Verify if installplan has been created. 285 ```console 286 $ oc get installplan -n nvidia-gpu-operator 287 ``` 288 In column APPROVED you will see `false` 289 1. Approve the plan: 290 ```console 291 $ oc patch installplan.operators.coreos.com/<install_plan_name> \ 292 -n nvidia-gpu-operator --type merge \ 293 --patch '{"spec":{"approved":true }}' 294 ``` 295 296 Now, it is needed to build an image which will be used by gpu-operator for 297 building drivers on the cluster. 298 299 Download needed drivers from the [NVIDIA application 300 hub](https://nvid.nvidia.com/dashboard/), along with vgpuDriverCatalog.yaml 301 file. The only files needed for vGPU are (at the time of 302 writing): 303 304 - NVIDIA-Linux-x86_64-510.85.02-grid.run 305 - vgpuDriverCatalog.yaml 306 - gridd.conf 307 308 Note, that drivers which should be used are the **guest** ones, not the host, 309 which was installed on the OpenStack compute node. 310 311 Clone the driver repository and copy all of needed drivers to the 312 driver/rehel8/drivers directory: 313 314 ```console 315 $ git clone https://gitlab.com/nvidia/container-images/driver 316 $ cd driver rhel8 317 $ cp /path/to/obtained/drivers/* drivers/ 318 ``` 319 320 Create gridd.conf file and copy it to `drivers` (installation of licensing 321 server is out of scope for this document): 322 ``` 323 # Description: Set License Server Address 324 # Data type: string 325 # Format: "<address>" 326 ServerAddress=<licensing_server_address> 327 ``` 328 329 Go to the driver/rhel8/ path, and prepare image: 330 ```console 331 $ export PRIVATE_REGISTRY=<registry_name/path> 332 $ export OS_TAG=<ocp_tag> 333 $ export VERSION=<version> 334 $ export VGPU_DRIVER_VERSION=<vgpu_version> 335 $ export CUDA_VERSION=<cuda_version> 336 $ export TARGETARCH=<architecture> 337 $ podman build \ 338 --build-arg CUDA_VERSION=${CUDA_VERSION} \ 339 --build-arg DRIVER_TYPE=vgpu \ 340 --build-arg TARGETARCH=$TARGETARCH \ 341 --build-arg DRIVER_VERSION=$VGPU_DRIVER_VERSION \ 342 -t ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG} . 343 ``` 344 345 where: 346 347 - `PRIVATE_REGISTRY` is a name for private registry where image will be pushed 348 to/pulled from, i.e. "quay.io/someuser" 349 - `OS_TAG` is a proper string matching RHCOS version used for cluster 350 installation, i.e. "rhcos4.12" 351 - `VERSION` may be any string or number, i.e. "1.0.0" 352 - `VGPU_DRIVER_VERSION` is a substring from drivers. I.e. if there is file for 353 building driver like "NVIDIA-Linux-x86_64-510.85.02-grid.run", then the 354 version will be "510.85.02-grid". 355 - `CUDA_VERSION` is the latest supported version of CUDA supported on that 356 particular GPU (or any other needed), i.e. "11.7.1". 357 - `TARGETARCH` is the target architecture which cluster runs on (usually 358 "x86_64") 359 360 361 Push image to the registry: 362 ```console 363 $ podman push ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG} 364 ``` 365 366 Create license server configmap: 367 ```console 368 $ oc create configmap licensing-config \ 369 -n nvidia-gpu-operator --from-file=drivers/gridd.conf 370 ``` 371 372 Create secret for connecting to the registry: 373 ```console 374 $ oc -n nvidia-gpu-operator \ 375 create secret docker-registry my-registry \ 376 --docker-server=${PRIVATE_REGISTRY} \ 377 --docker-username=<username> \ 378 --docker-password=<pass> \ 379 --docker-email=<e-mail> 380 ``` 381 382 Substitute `<username>` `<pass>` and `<e-mail>` with real data. Here, 383 `my-registry` is used as the name of the secret and also could be changed (it 384 corresponds with `imagePullSectrets` array in `clusterpolicy` later on). 385 386 Get the clusterpolicy: 387 ```console 388 $ oc get csv -n nvidia-gpu-operator $GPU_OP_NAME \ 389 -o jsonpath={.metadata.annotations.alm-examples} | \ 390 jq .[0] > clusterpolicy.json 391 ``` 392 393 Edit it and add marked in fields: 394 395 ```json 396 { 397 ... 398 "spec": { 399 ... 400 "driver": { 401 ... 402 "repository": "<registry_name/path>", 403 "image": "driver", 404 "imagePullSecrets": ["my-registry"], 405 "licensingConfig": { 406 "configMapName": "licensing-config", 407 "nlsEnabled": true 408 }, 409 "version": "<version>", 410 ... 411 } 412 ... 413 } 414 } 415 ``` 416 417 Apply changes: 418 ```console 419 $ oc apply -f clusterpolicy.json 420 ``` 421 422 Wait for drivers to be built. It may take a while. State of the pods should be 423 either running or completed. 424 ```console 425 $ oc get pods -n nvidia-gpu-operator 426 ``` 427 428 ## Run sample app 429 430 To verify installation, create simple app (app.yaml): 431 ```yaml 432 apiVersion: v1 433 kind: Pod 434 metadata: 435 name: cuda-vectoradd 436 spec: 437 restartPolicy: OnFailure 438 containers: 439 - name: cuda-vectoradd 440 image: "nvidia/samples:vectoradd-cuda11.2.1" 441 resources: 442 limits: 443 nvidia.com/gpu: 1 444 ``` 445 446 Run it: 447 ```console 448 $ oc apply -f app.yaml 449 ``` 450 451 Check the logs after pod finish its job: 452 ```console 453 $ oc logs cuda-vectoradd 454 [Vector addition of 50000 elements] 455 Copy input data from the host memory to the CUDA device 456 CUDA kernel launch with 196 blocks of 256 threads 457 Copy output data from the CUDA device to the host memory 458 Test PASSED 459 Done 460 ```