github.com/openshift/installer@v1.4.17/docs/user/openstack/deploy_sriov_workers.md (about) 1 # Installing With SR-IOV Worker Nodes 2 3 ## Table of Contents 4 5 - [Prerequisites](#prerequisites) 6 - [Creating SR-IOV Networks for Worker Nodes](#creating-sr-iov-networks-for-worker-nodes) 7 - [Creating SR-IOV Worker Nodes in IPI](#creating-sr-iov-worker-nodes-in-ipi) 8 - [Install the SRIOV Network Operator and configure a network device](#install-the-sriov-network-operator-and-configure-a-network-device) 9 - [Attach the OVS HW offload network](#attach-the-ovs-hw-offload-network) 10 - [Deploy a testpmd pod](#deploy-a-testpmd-pod) 11 - [Deploy a testpmd pod with OVS Hardware Offload](#deploy-a-testpmd-pod-with-ovs-hardware-offload) 12 - [Creating SR-IOV Worker Nodes in UPI](#creating-sr-iov-worker-nodes-in-upi) 13 14 ## Prerequisites 15 16 Single Root I/O Virtualization (SR-IOV) networking in OpenShift can benefit applications 17 that require high bandwidth and low latency. To plan an OpenStack deployment that uses SR-IOV network interface cards (NICs), refer to [the OSP 16.1 installation documentation][osp-sriov-install]. you install an OpenShift cluster on OpenStack, make sure that the NICs that your OpenStack nodes use [are supported][supported-nics] for use with SR-IOV 18 in OpenShift, and that your tenant has access to them. Your OpenStack cluster must meet the following quota requirements for each OpenShift node that has an attached SR-IOV NIC: 19 20 - One instance from the RHOSP quota 21 - One port attached to the machines subnet 22 - One port for each SR-IOV Virtual Function 23 - A flavor with at least 16 GB memory, 4 vCPUs, and 100 GB storage space 24 25 For all clusters that use single-root input/output virtualization (SR-IOV), RHOSP compute nodes require a flavor that supports [huge pages][huge-pages]. 26 Deploying worker nodes with SR-IOV networks is supported as a post-install operation for both IPI and UPI workflows. After you verify that your OpenStack cluster can support SR-IOV in OpenShift and you install an OpenShift cluster that meets the [minimum requirements](README.md#openstack-requirements), use the following steps and examples to create worker nodes with SR-IOV NICs. 27 28 If you need to configure your deployment for real-time or low latency workloads, you'll need to create a [PerformanceProfile][performance-profile]. 29 30 After your OpenShift control plane is running, you must install the SR-IOV Network Operator. To install the Operator, you will need access to an account on your OpenShift cluster that has `cluster-admin` privileges. After you log in to the account, [install the Operator][sriov-operator]. Then, [configure your SR-IOV network device][configure-sriov-network-device]. 31 32 ## Creating SR-IOV Networks for Worker Nodes 33 34 You must create SR-IOV networks to attach to worker nodes before you create the nodes. Reference the following example of how to create radio and uplink provider networks in OpenStack: 35 36 ```sh 37 # Create Networks 38 openstack network create radio --provider-physical-network radio --provider-network-type vlan --provider-segment 120 39 openstack network create uplink --provider-physical-network uplink --provider-network-type vlan --external 40 41 # Create Subnets 42 openstack subnet create --network radio --subnet-range <radio_network_subnet_range> radio 43 openstack subnet create --network uplink --subnet-range <uplink_network_subnet_range> uplink 44 ``` 45 46 ## Creating SR-IOV Worker Nodes in IPI 47 48 You can create worker nodes as a post-IPI-install operation by using the machine API. To create a new set of worker nodes, [create a new machineSet in OpenShift][openstack-machine-sets]. 49 50 ```sh 51 oc get machineset -n openshift-machine-api <machineset_name> -o yaml > sriov_machineset.yaml 52 ``` 53 54 When editing an existing machineSet (or a copy of one) to create SR-IOV worker nodes, add each subnet that is configured for SR-IOV to the machineSet's `providerSpec`. The following example attaches ports from the `radio` and `uplink` subnets, which were created in the previous example, to all of the worker nodes in the machineSet. For all SR-IOV ports, you must set the following parameters: 55 56 - `nicType: direct` 57 - `portSecurity:false` 58 59 Note that security groups or allowedAddressPairs can not be set on a port if `portSecurity` is disabled. If you are using a network with port security disabled, then allowed address pairs and security groups cannot be used for any port in that network. Setting security groups on the instance will apply that security group to all ports attached to it, be aware of this when using networks with port security disabled. Right now, trunking is not enabled on ports defined in the `ports` list, only the ports created by entries in the `networks` or `subnets` lists. The name of the port will be `<machine-name>-<nameSuffix>`, and the `nameSuffix` is required field in the port definition. Optionally, you can add tags to ports by adding them to the `tags` list. The following example shows how a machineset can be created that creates SR-IOV capable ports on the `Radio` and `Uplink` networks and subnets that were defined in a previous example: 60 61 62 ```yaml 63 apiVersion: machine.openshift.io/v1beta1 64 kind: MachineSet 65 metadata: 66 labels: 67 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 68 machine.openshift.io/cluster-api-machine-role: <node_role> 69 machine.openshift.io/cluster-api-machine-type: <node_role> 70 name: <infrastructure_ID>-<node_role> 71 namespace: openshift-machine-api 72 spec: 73 replicas: <number_of_replicas> 74 selector: 75 matchLabels: 76 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 77 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role> 78 template: 79 metadata: 80 labels: 81 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 82 machine.openshift.io/cluster-api-machine-role: <node_role> 83 machine.openshift.io/cluster-api-machine-type: <node_role> 84 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role> 85 spec: 86 metadata: 87 providerSpec: 88 value: 89 apiVersion: openstackproviderconfig.openshift.io/v1alpha1 90 cloudName: openstack 91 cloudsSecret: 92 name: openstack-cloud-credentials 93 namespace: openshift-machine-api 94 flavor: <nova_flavor> 95 image: <glance_image_name_or_location> 96 serverGroupID: <optional_UUID_of_server_group> 97 kind: OpenstackProviderSpec 98 networks: 99 - subnets: 100 - uuid: <machines_subnet_uuid> 101 ports: 102 - networkID: <radio_network_uuid> 103 nameSuffix: radio 104 fixedIPs: 105 - subnetID: <radio_subnet_uuid> 106 tags: 107 - sriov 108 - radio 109 vnicType: direct 110 portSecurity: false 111 - networkID: <uplink_network_uuid> 112 nameSuffix: uplink 113 fixedIPs: 114 - subnetID: <uplink_subnet_uuid> 115 tags: 116 - sriov 117 - uplink 118 vnicType: direct 119 portSecurity: false 120 primarySubnet: <machines_subnet_uuid> 121 securityGroups: 122 - filter: {} 123 name: <infrastructure_ID>-<node_role> 124 serverMetadata: 125 Name: <infrastructure_ID>-<node_role> 126 openshiftClusterID: <infrastructure_ID> 127 tags: 128 - openshiftClusterID=<infrastructure_ID> 129 trunk: true 130 userDataSecret: 131 name: <node_role>-user-data 132 availabilityZone: <optional_openstack_availability_zone> 133 ``` 134 135 If your port is leveraging OVS Hardware Offload, then its configuration must be the following, so 136 the port in Neutron will be created with the right capabilites: 137 138 ```yaml 139 (...) 140 ports: 141 - fixedIPs: 142 - subnetID: <radio_subnet_uuid> 143 nameSuffix: sriov 144 networkID: <radio_network_uuid> 145 portSecurity: false 146 profile: 147 capabilities: '[switchdev]' 148 tags: 149 - sriov 150 - radio 151 vnicType: direct 152 (...) 153 ``` 154 155 After you finish editing your machineSet, upload it to your OpenShift cluster: 156 157 ```sh 158 oc create -f sriov_machineset.yaml 159 ``` 160 161 To create SR-IOV ports on a network with the port security disabled, you need to make additional changes to your machineSet due to security groups being set on the instance by default, and allowed address pairs automatically getting added to ports created through the `networks` and `subnets` interfaces. The solution is to define all of your ports with the `ports` interface in your machineSet. Remember that the port for the machines subnet needs: 162 - allowed address pairs for your API and ingress vip ports 163 - the worker security group 164 - to be attached to the machines network and subnet 165 166 ```yaml 167 apiVersion: machine.openshift.io/v1beta1 168 kind: MachineSet 169 metadata: 170 labels: 171 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 172 machine.openshift.io/cluster-api-machine-role: <node_role> 173 machine.openshift.io/cluster-api-machine-type: <node_role> 174 name: <infrastructure_ID>-<node_role> 175 namespace: openshift-machine-api 176 spec: 177 replicas: <number_of_replicas> 178 selector: 179 matchLabels: 180 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 181 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role> 182 template: 183 metadata: 184 labels: 185 machine.openshift.io/cluster-api-cluster: <infrastructure_ID> 186 machine.openshift.io/cluster-api-machine-role: <node_role> 187 machine.openshift.io/cluster-api-machine-type: <node_role> 188 machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role> 189 spec: 190 metadata: {} 191 providerSpec: 192 value: 193 apiVersion: openstackproviderconfig.openshift.io/v1alpha1 194 cloudName: openstack 195 cloudsSecret: 196 name: openstack-cloud-credentials 197 namespace: openshift-machine-api 198 flavor: <nova_flavor> 199 image: <glance_image_name_or_location> 200 kind: OpenstackProviderSpec 201 ports: 202 - allowedAddressPairs: 203 - ipAddress: <api_vip_port_IP> 204 - ipAddress: <ingress_vip_port_IP> 205 fixedIPs: 206 - subnetID: <machines_subnet_UUID> 207 nameSuffix: nodes 208 networkID: <machines_network_UUID> 209 securityGroups: 210 - <worker_security_group_UUID> 211 - networkID: <sriov_network_UUID> 212 nameSuffix: sriov 213 fixedIPs: 214 - subnetID: <sriov_subnet_UUID> 215 tags: 216 - sriov 217 vnicType: direct 218 portSecurity: False 219 primarySubnet: <machines_subnet_UUID> 220 serverMetadata: 221 Name: <infrastructure_ID>-<node_role> 222 openshiftClusterID: <infrastructure_ID> 223 tags: 224 - openshiftClusterID=<infrastructure_ID> 225 trunk: false 226 userDataSecret: 227 name: worker-user-data 228 ``` 229 230 Once the workers are deployed, you must label them as SR-IOV capable: 231 232 ```bash 233 oc label node <node-name> feature.node.kubernetes.io/network-sriov.capable="true" 234 ``` 235 236 ## Install the SRIOV Network Operator and configure a network device 237 238 You must install the SR-IOV Network Operator. To install the Operator, you will need access to an account on your OpenShift cluster that has `cluster-admin` privileges. After you log in to the account, [install the Operator][operator]. 239 240 Then, [configure your SR-IOV network device][device]. Note that only `netFilter` needs to be used from the `nicSelector`, as we'll give the Neutron network ID used for SR-IOV traffic. 241 242 Example of `SriovNetworkNodePolicy` named `sriov1`: 243 244 ``` 245 apiVersion: sriovnetwork.openshift.io/v1 246 kind: SriovNetworkNodePolicy 247 metadata: 248 name: sriov1 249 namespace: openshift-sriov-network-operator 250 spec: 251 deviceType: vfio-pci 252 isRdma: false 253 nicSelector: 254 netFilter: openstack/NetworkID:9144121f-bf90-4891-b061-323e4cd990ed 255 nodeSelector: 256 feature.node.kubernetes.io/network-sriov.capable: 'true' 257 numVfs: 1 258 priority: 99 259 resourceName: sriov1 260 ``` 261 262 Note: If the network device plugged to the network is not from Intel and is from Mellanox, then `deviceType` must be set to `netdevice` and `isRdma` set to `true`. 263 264 The SR-IOV network operator will automatically discover the devices connected on that network for each worker, and make them available for use by the CNF pods later. 265 266 ## Attach the OVS HW offload network 267 268 This step can be skipped when not doing OVS Hardware offload. 269 For OVS Hardware Offload, the network has to be attached via a host-device. 270 271 Create a file named `network.yaml`: 272 273 ```yaml 274 spec: 275 additionalNetworks: 276 - name: hwoffload1 277 namespace: cnf 278 rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "hwoffload1", "type": "host-device","pciBusId": "0000:00:05.0", "ipam": {}}' 279 type: Raw 280 ``` 281 282 And then run: 283 284 ```sh 285 oc patch network.operator cluster --patch "$(cat network.yaml)" --type=merge 286 ``` 287 288 It usually takes about 15 seconds to apply the configuration. 289 290 Note: `0000:00:05.0` is the PCI Bus ID that corresponds to the device connected to OVS HW Offload, this can be discovered by running `oc describe SriovNetworkNodeState -n openshift-sriov-network-operator`. 291 292 293 ## Deploy a testpmd pod 294 295 This pod is an example of how we can create a container that uses the hugepages, the reserved CPUs and the SR-IOV port: 296 297 ``` 298 apiVersion: v1 299 kind: Pod 300 metadata: 301 name: testpmd-sriov 302 namespace: mynamespace 303 spec: 304 containers: 305 - name: testpmd 306 command: ["sleep", "99999"] 307 image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9 308 securityContext: 309 capabilities: 310 add: ["IPC_LOCK","SYS_ADMIN"] 311 privileged: true 312 runAsUser: 0 313 resources: 314 requests: 315 memory: 1000Mi 316 hugepages-1Gi: 1Gi 317 cpu: '2' 318 openshift.io/sriov1: 1 319 limits: 320 hugepages-1Gi: 1Gi 321 cpu: '2' 322 memory: 1000Mi 323 openshift.io/sriov1: 1 324 volumeMounts: 325 - mountPath: /dev/hugepages 326 name: hugepage 327 readOnly: False 328 volumes: 329 - name: hugepage 330 emptyDir: 331 medium: HugePages 332 ``` 333 334 More examples are documented [here][pods]. 335 336 ## Deploy a testpmd pod with OVS Hardware Offload 337 338 The same example as before, except this time we use the network for OVS Hardware Offload: 339 340 ``` 341 apiVersion: v1 342 kind: Pod 343 metadata: 344 name: testpmd-sriov 345 namespace: mynamespace 346 annotations: 347 k8s.v1.cni.cncf.io/networks: hwoffload1 348 spec: 349 containers: 350 - name: testpmd 351 command: ["sleep", "99999"] 352 image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9 353 securityContext: 354 capabilities: 355 add: ["IPC_LOCK","SYS_ADMIN"] 356 privileged: true 357 runAsUser: 0 358 resources: 359 requests: 360 memory: 1000Mi 361 hugepages-1Gi: 1Gi 362 cpu: '2' 363 limits: 364 hugepages-1Gi: 1Gi 365 cpu: '2' 366 memory: 1000Mi 367 volumeMounts: 368 - mountPath: /dev/hugepages 369 name: hugepage 370 readOnly: False 371 volumes: 372 - name: hugepage 373 emptyDir: 374 medium: HugePages 375 ``` 376 377 ## Creating SR-IOV Worker Nodes in UPI 378 379 Because UPI implementation depends largely on your deployment environment and requirements, there is no official script for deploying SR-IOV worker nodes. However, we can share a verified example that is based on the [compute-nodes.yaml](../../../upi/openstack/compute-nodes.yaml) script to help you understand the process. To use the script, open up a terminal to the location of the `inventory.yaml` and `common.yaml` UPI Ansible scripts. In the following example, we add provider networks named `radio` and `uplink` to the `inventory.yaml` file. Note that the count parameter specifies the number of virtual functions (VFs) to attach to each worker node. This code can also be found on [github](https://github.com/shiftstack/SRIOV-Compute-Nodes-Ansible-Automation). 380 381 ```yaml 382 .... 383 # If this value is non-empty, the corresponding floating IP will be 384 # attached to the bootstrap machine. This is needed for collecting logs 385 # in case of install failure. 386 os_bootstrap_fip: '203.0.113.20' 387 388 additionalNetworks: 389 - id: radio 390 count: 4 391 type: direct 392 port_security_enabled: no 393 - id: uplink 394 count: 4 395 type: direct 396 port_security_enabled: no 397 ``` 398 Next, create a file called `compute-nodes.yaml` with this Ansible script: 399 400 ```yaml 401 - import_playbook: common.yaml 402 403 - hosts: all 404 gather_facts: no 405 406 vars: 407 worker_list: [] 408 port_name_list: [] 409 nic_list: [] 410 411 tasks: 412 # Create the SDN/primary port for each worker node 413 - name: 'Create the Compute ports' 414 os_port: 415 name: "{{ item.1 }}-{{ item.0 }}" 416 network: "{{ os_network }}" 417 security_groups: 418 - "{{ os_sg_worker }}" 419 allowed_address_pairs: 420 - ip_address: "{{ os_ingressVIP }}" 421 with_indexed_items: "{{ [os_port_worker] * os_compute_nodes_number }}" 422 register: ports 423 424 # Tag each SDN/primary port with cluster name 425 - name: 'Set Compute ports tag' 426 command: 427 cmd: "openstack port set --tag {{ cluster_id_tag }} {{ item.1 }}-{{ item.0 }}" 428 with_indexed_items: "{{ [os_port_worker] * os_compute_nodes_number }}" 429 430 - name: ‘Call additional-port processing’ 431 include_tasks: additional-ports.yaml 432 433 # Create additional ports in OpenStack 434 - name: ‘Create additionalNetworks ports’ 435 os_port: 436 name: "{{ item.0 }}-{{ item.1.name }}" 437 vnic_type: "{{ item.1.type }}" 438 network: "{{ item.1.uuid }}" 439 port_security_enabled: "{{ item.1.port_security_enabled|default(omit) }}" 440 no_security_groups: "{{ 'true' if item.1.security_groups is not defined else omit }}" 441 security_groups: "{{ item.1.security_groups | default(omit) }}" 442 with_nested: 443 - "{{ worker_list }}" 444 - "{{ port_name_list }}" 445 446 # Tag the ports with the cluster info 447 - name: 'Set additionalNetworks ports tag' 448 command: 449 cmd: "openstack port set --tag {{ cluster_id_tag }} {{ item.0 }}-{{ item.1.name }}" 450 with_nested: 451 - "{{ worker_list }}" 452 - "{{ port_name_list }}" 453 454 # Build the nic list to use for server create 455 - name: Build nic list 456 set_fact: 457 nic_list: "{{ nic_list | default([]) + [ item.name ] }}" 458 with_items: "{{ port_name_list }}" 459 460 # Create the servers 461 - name: 'Create the Compute servers' 462 vars: 463 worker_nics: "{{ [ item.1 ] | product(nic_list) | map('join','-') | map('regex_replace', '(.*)', 'port-name=\\1') | list }}" 464 os_server: 465 name: "{{ item.1 }}" 466 image: "{{ os_image_rhcos }}" 467 flavor: "{{ os_flavor_worker }}" 468 auto_ip: no 469 userdata: "{{ lookup('file', 'worker.ign') | string }}" 470 security_groups: [] 471 nics: "{{ [ 'port-name=' + os_port_worker + '-' + item.0|string ] + worker_nics }}" 472 with_indexed_items: "{{ worker_list }}" 473 ``` 474 475 Create a new Ansible script named `additional-ports.yaml`: 476 477 ```yaml 478 Build a list of worker nodes with indexes 479 - name: ‘Build worker list’ 480 set_fact: 481 worker_list: "{{ worker_list | default([]) + [ item.1 + '-' + item.0 | string ] }}" 482 with_indexed_items: "{{ [ os_compute_server_name ] * os_compute_nodes_number }}" 483 484 # Ensure that each network specified in additionalNetworks exists 485 - name: ‘Verify additionalNetworks’ 486 os_networks_info: 487 name: "{{ item.id }}" 488 with_items: "{{ additionalNetworks }}" 489 register: network_info 490 491 # Expand additionalNetworks by the count parameter in each network definition 492 - name: ‘Build port and port index list for additionalNetworks’ 493 set_fact: 494 port_list: "{{ port_list | default([]) + [ { 495 'net_name' : item.1.id, 496 'uuid' : network_info.results[item.0].openstack_networks[0].id, 497 'type' : item.1.type|default('normal'), 498 'security_groups' : item.1.security_groups|default(omit), 499 'port_security_enabled' : item.1.port_security_enabled|default(omit) 500 } ] * item.1.count|default(1) }}" 501 index_list: "{{ index_list | default([]) + range(item.1.count|default(1)) | list }}" 502 with_indexed_items: "{{ additionalNetworks }}" 503 504 # Calculate and save the name of the port 505 # The format of the name is cluster_name-worker-workerID-networkUUID(partial)-count 506 # i.e. fdp-nz995-worker-1-99bcd111-1 507 - name: ‘Calculate port name’ 508 set_fact: 509 port_name_list: "{{ port_name_list | default([]) + [ item.1 | combine( {'name' : item.1.uuid | regex_search('([^-]+)') + '-' + index_list[item.0]|string } ) ] }}" 510 with_indexed_items: "{{ port_list }}" 511 when: port_list is defined 512 ``` 513 514 Finally, run the `compute-nodes.yaml` script as you normally would: 515 516 ```sh 517 ansible-playbook -i inventory.yaml compute-nodes.yaml 518 ``` 519 520 Make sure to follow the documentation to [approve the CSRs][approve-csr-upi] for your worker nodes, and to [wait for the installation to complete][wait-for-install-complete] to finalize your deployment. 521 522 [wait-for-install-complete]: install_upi.md#wait-for-the-openshift-installation-to-complete 523 [approve-csr-upi]: install_upi.md#approve-the-worker-csrs 524 [machine-pool-customizations]: customization.md#machine-pools 525 [sriov-operator]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/installing-sriov-operator.html 526 [configure-sriov-network-device]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/configuring-sriov-device.html 527 [supported-nics]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/about-sriov.html#supported-devices_about-sriov 528 [osp-sriov-install]: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/network_functions_virtualization_planning_and_configuration_guide/index#assembly_sriov_parameters 529 [openstack-machine-sets]: https://docs.openshift.com/container-platform/4.10/machine_management/creating_machinesets/creating-machineset-osp.html 530 [performance-profile]: https://docs.openshift.com/container-platform/4.10/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html#about_hyperthreading_for_low_latency_and_real_time_applications_cnf-master 531 [huge-pages]: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/network_functions_virtualization_planning_and_configuration_guide/index#c_ovsdpdk-instance-extra-specs 532 [operator]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/installing-sriov-operator.html 533 [device]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/configuring-sriov-device.html 534 [pods]: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/add-pod.html