github.com/k8snetworkplumbingwg/sriov-network-operator@v1.2.1-0.20240408194816-2d2e5a45d453/doc/design/externally-manage-pf.md (about) 1 --- 2 title: Externally Manage PF 3 authors: 4 - SchSeba 5 reviewers: 6 - zeeke 7 - adrianchiris 8 creation-date: 12-07-2023 9 last-updated: 12-07-2023 10 --- 11 12 # Externally Manage PF 13 14 ## Summary 15 16 Allow the SR-IOV network operator to configure and allocate a subset of virtual functions from 17 a physical function that is configured externally from SR-IOV network operator. 18 19 ## Motivation 20 21 The feature is needed to allow the operator to only configure a subset of virtual functions. 22 This allows a third party component like nmstate, kubernetes-nmstate, NetworkManager to handle the creation 23 and the usage of the virtual functions on the system. Some of the examples are using the virtual function as the primary 24 nic for the k8s SDN network or a storage network. 25 26 Before this change the SR-IOV network operator is the only component that should use/configure VFs. not allowing the user 27 to use some of the VFs for host networking. 28 29 ### Use Cases 30 31 * As a user I want to use a virtual function for SDN network, for SDN the network need to be configured before 32 k8s is deployed and these VFs should be available at system startup before pods start running 33 * As a user I want to create the virtual functions via nmstate 34 * As a user I want pods to use virtual functions from a pre-configured PF 35 * As a user I want to allocate virtual functions to pods from a PF with custom configuration/driver 36 * As a user I want to use virtual functions to be configured for the storage subsystem before k8s is deployed / pods spinning up at system startup 37 38 ### Goals 39 40 * Allow the SR-IOV network operator to handle the configuration and pod allocation of some or all virtual functions 41 while PF configuration are managed by an external entity 42 * Allow the user to Allocate the number of virtual functions he wants for the system and the subset he wants for pods 43 44 ### Non-Goals 45 46 * Supporting switchdev mode (may change in the future if there is a request) 47 * Supporting the creation of the VFs on boot by the operator possible to use operator systemd mode for that 48 49 ## Proposal 50 51 Create a sub-flow in the SR-IOV network operator where the user can request a configuration for all/subset of virtual functions 52 without any changes in the PF level. 53 54 The operator will first validate the requested PF contains the requested amount of virtual functions allocated, it 55 will also validate the requested MTU is configured as expected on the PF. 56 If that is not the case the `sriovNetworkNodeState.status.SyncStatus` field will be report a `Failed` 57 58 Then the operator will configure the subset of virtual functions with the requested driver and will update the device plugin 59 configmap with the expected information to create the relevant pools. 60 61 Existing sriov network config daemon flow: 62 1. Apply the `numOfVfs` 63 2. Configure the MTU on the PF 64 3. Copy the Administrative mac address from the VFs 65 4. Bind the right driver for the VF 66 5. restart sriov network device plugin 67 68 Externally manage sriov network config daemon flow: 69 1. Copy the Administrative mac address from the VFs 70 2. Bind the right driver for the VF 71 3. restart sriov network device plugin 72 73 In both flows: 74 * In case of Infiniband link type it will generate random node and port GUID for the interface. 75 * In case of RDMA (both for ETH and IB) it will perform an unbind/bind of the VF driver to set RDMA Node/Port GUID. 76 77 ### Workflow Description 78 79 The user will allocate the virtual functions on the system with any third party tool like nmstate, Kubnernetes-nmstate, 80 systemd scripts, etc.. 81 82 The user must perform the sriov allocation/configuration before kubelet starts or more specifically 83 before SR-IOV Network operator configuration daemon starts running on the node. 84 85 Then the user will be able to create a policy telling the operator that the PF is externally managed by the user. 86 87 If the user want to create the virtual functions after the SR-IOV Network config daemon is already running on the system he will need 88 to disable the webhook. the policy will be on failed state until the virtual functions needed for the policy exist 89 on the node. the SR-IOV Network config daemon will continue to reconcile until the virtual functions exists 90 91 #### Policy Example: 92 ```yaml 93 apiVersion: sriovnetwork.openshift.io/v1 94 kind: SriovNetworkNodePolicy 95 metadata: 96 name: sriov-nic-1 97 namespace: sriov-network-operator 98 spec: 99 deviceType: netdevice 100 nicSelector: 101 pfNames: ["ens3f0#5-9"] 102 nodeSelector: 103 node-role.kubernetes.io/worker: "" 104 numVfs: 10 105 priority: 99 106 resourceName: sriov_nic_1 107 externallyManaged: true 108 ``` 109 110 The PF and VFs 0-4 are externally managed. 111 For example nmstate will create 10 vfs, but will only consume VF 0 and 4 in its configuration. Nmstate will also manage the MTU and other parameters of the PF. 112 113 #### Another Policy Example: 114 In this case we allocate all the virtual functions from the PF 115 116 ```yaml 117 apiVersion: sriovnetwork.openshift.io/v1 118 kind: SriovNetworkNodePolicy 119 metadata: 120 name: sriov-nic-2 121 namespace: sriov-network-operator 122 spec: 123 deviceType: netdevice 124 nicSelector: 125 pfNames: ["ens3f0"] 126 nodeSelector: 127 node-role.kubernetes.io/worker: "" 128 numVfs: 10 129 priority: 99 130 resourceName: sriov_nic_1 131 externallyManaged: true 132 ``` 133 134 The SR-IOV network operator will use all the 10 virtual functions created externally by the user. 135 One if the main use cases for this is if the user want to do some custom configuration to the PF and VFs like loading 136 out of tree drivers or other stuff that the operator doesn't support. 137 138 #### Validation 139 The SR-IOV network operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate 140 if not it will reject the policy creation. 141 142 The SR-IOV network operator will do a validation webhook to check if the requested MTU is lower or equal to what exist on the PF 143 if not it will reject the policy creation. 144 145 146 *Note:* Same validation will be done in the SR-IOV config-daemon container to cover cases where the user doesn't want to deploy" 147 the webhook and to cover scale-up adding new nodes. If the verification failed in the policy apply stage 148 the `sriovNetworkNodeState.status.SyncStatus` field will be report a `Failed` status and the error description will 149 get exposed in `sriovNetworkNodeState.status.LastSyncError` 150 151 152 #### Configuration 153 154 The SR-IOV network operator config daemon will reconcile on the SriovNetworkNodeState update and will follow the regular 155 flow of virtual functions *SKIPPING* only the Virtual function allocation. 156 157 The SR-IOV network operator will update the SR-IOV Network Device Plugin with the pool information 158 159 Another change with the operator beavior is when we delete a policy with had `externallyManaged: true` the SR-IOV network operator 160 will *NOT* reset the `numVfs` 161 162 ### API Extensions 163 164 For SriovNetworkNodePolicy 165 166 ```golang 167 // SriovNetworkNodePolicySpec defines the desired state of SriovNetworkNodePolicy 168 type SriovNetworkNodePolicySpec struct { 169 ... 170 + // don't create the virtual function only assign to the driver and allocated them to device plugin. Defaults to false. 171 + ExternallyManaged bool `json:"externallyManaged,omitempty"` 172 } 173 ``` 174 175 For SriovNetworkNodeState 176 177 ```golang 178 type Interface struct { 179 ... 180 + ExternallyManaged bool `json:"externallyManaged,omitempty"` 181 } 182 ``` 183 184 ### Implementation Details/Notes/Constraints 185 186 #### Webhook 187 For the webhook we add more validations when the policy contains `ExternallyManaged: true` 188 * `numVfs` in the policy equal is equal or lower the number of virtual functions on the system 189 * `MTU` in the policy equals or lower the MTU we discover on the PF 190 * `LinkType` in the policy equals the link type we discover on the PF 191 192 #### Controller/Manager 193 194 The changes in the manager for this feature are minimal we only copy the `ExternallyManaged` boolean from the policy 195 to the generated `nodeState.Spec` 196 197 #### Config Daemon 198 199 This is where most of the changes for this feature are implemented. 200 201 * do a validation same as on the webhook to check the PF have everything we need to apply the requested 202 policy, by checking the `numVfs`, `MTU` and `LinkType`. 203 * skip all the PF configuration like `numVfs`, `MTU` and `LinkType`. he will only perform the virtual function 204 driver binding, administrative mac allocation and MTU. 205 * in case of Infiniband link type it will generate random node and port GUID for the interface 206 * in case of RDMA (both for ETH and IB) it will perform an unbind/bind of the VF driver to set RDMA Node/Port GUID. 207 * reset the device plugin so kubelet will be able to discover the SR-IOV devices. 208 209 *NOTE:* The config-daemon will also save on the node a cache (file) of the last applied policy. this is needed to be able and understand 210 if we need to reset the PF configuration(`ExternallyManaged` was false) or not when policy is removed. 211 212 ### Upgrade & Downgrade considerations 213 214 The feature supports both Upgrade and Downgrade as we are introducing a new field in the API. 215 Downgrade will cause the operator to treat an externally managed PF as non externally managed and actually configure PF, 216 this may cause conflicts in the system. 217 218 ### Test Plan 219 220 * Should not allow to create a policy with externallyManaged true if there are no vfs configured 221 * Should create a policy if the number of requested vfs is equal 222 * Should create a policy if the number of requested vfs is equal and not delete them when the policy is removed 223 * should reset the virtual functions if externallyCreated is false 224 * should to configure a policy with externallyManaged true if there are no vfs configured with disabled webhook