github.com/k8snetworkplumbingwg/sriov-network-operator@v1.2.1-0.20240408194816-2d2e5a45d453/doc/design/switchdev-refactoring.md (about) 1 --- 2 title: switchdev mode refactoring 3 authors: 4 - ykulazhenkov 5 reviewers: 6 creation-date: 14-12-2023 7 last-updated: 14-12-2023 8 --- 9 10 # switchdev and systemd modes refactoring 11 12 ## Summary 13 14 We need to refactor the implementation used for NICs in switchdev mode and align its behavior with the systemd 15 mode of the operator. The refactoring is required to simplify the development of the new switchdev-related 16 features for the sriov-network-operator. 17 18 ## Motivation 19 20 Currently the **sriov-network-operator** supports two configuration modes: 21 * `daemon` 22 * `systemd` 23 24 The configuration mode can be changed by setting the `configurationMode` field in the `SriovOperatorConfig` CR. 25 26 _**Note**: This setting is global and applies to all sriov-network-operator-daemons in the cluster._ 27 28 In the `daemon` mode, which historically is the first implemented mode, 29 the operator will setup NICs with _**SRIOV legacy**_ configuration directly in the **sriov-network-operator-daemon** 30 component by executing all enabled plugins. 31 32 When the operator is in `systemd` mode, the **sriov-network-operator-daemon** component will execute most plugins 33 in the same way as in the `daemon` mode but will skip the call of the *generic* or *virtual* 34 plugin(when running in a virtualized environment) and instead will render config for the systemd 35 service that starts on the next OS boot and calls one of these plugins. 36 Then, the result of the service execution is handled by the **sriov-network-operator-daemon**. 37 38 The `systemd` mode was implemented to support scenarios when, after the host reboot, 39 we need VFs to be configured before Kubernetes (kubelet) and Pods with workloads are started. 40 41 To setup NICs with _**switchdev**_ configuration, the operator uses a different flow that ignores 42 the `configurationMode` setting. Two systemd services(not the same as used for systemd mode) are created on the node. 43 The first service is executed before the NetworkManager starts, and the second one after. 44 Both services run bash scripts. The script from the first service is responsible for VFs creation and for 45 switching a NIC to the switchdev eSwitch mode. The script from the second service binds VFs to the required drivers. 46 47 If a NIC has _**switchdev**_ configuration, then `configurationMode` of the operator does not affect it. 48 49 ```mermaid 50 --- 51 title: Current logic 52 --- 53 flowchart TD; 54 operatorMode["SriovOperatorConfig.configurationMode"]; 55 nicMode["NIC's eSwitch mode"]; 56 inDaemon["apply configuration(legacy) in 57 sriov-network-operator-daemon"] 58 bashSystemd["create two switchdev services(bash scripts) 59 to handle configuration(switchdev) on boot"]; 60 systemdSystemd["create single systemd service(go code) 61 to handle configuration(legacy) on boot"] 62 63 nicMode-- switchdev --> bashSystemd; 64 nicMode-- legacy --> operatorMode; 65 operatorMode-- daemon --> inDaemon; 66 operatorMode-- systemd --> systemdSystemd; 67 68 ``` 69 70 #### Problems of the current implementation 71 72 * it is confusing that `configurationMode` does not affect devices with switchdev configuration. 73 * system services for switchdev configuration are shell scripts completely independent from the 74 main code base and it is hard to extend them with new functionality. 75 * for switchdev NICs, VF configuration flow (bash-based) has some limitations compared to legacy VF configuration. 76 * it is impossible to apply switchdev configuration for the NIC without reboot. 77 78 ### Use Cases 79 80 * As a developer I don't want to maintain the code with similar logic in two places: 81 switchdev bash scrips and **sriov-network-operator-daemon** code. 82 * As a developer and a user I want to have only one set of systemd services that handle 83 both _**switchdev**_ configurations and `systemd` mode. 84 * As a user I want `configurationMode` to work the same way for NICs with 85 _**legacy**_ and _**switchdev**_ configurations. 86 * As a user I want to apply _**switchdev**_ configuration for NIC by the **sriov-network-operator-daemon** 87 without reboot (in case if reboot is not required by other logic, 88 e.g. kernel parameters configuration, FW configuration). 89 90 ### Goals 91 92 * it should be possible to apply _**switchdev**_ configuration in the **sriov-network-operator-daemon** without reboot. 93 * the code used by `daemon` and `systemd` modes to handle _**switchdev**_ and _**legacy**_ 94 configurations should be the same Golang code. 95 * `configurationMode` option should work the same for NICs with _**legacy**_ and _**switchdev**_ configurations. 96 * the operator should use unified systemd services which will be deployed only 97 if the operator works in the `systemd` mode. 98 * `systemd` mode should be changed to support 2 stage configuration: 99 pre system network manager (NetworkManager or netplan) and after system network manager. 100 _Note: This is required to support all use-cases supported by the current switchdev implementation._ 101 102 103 ### Non-Goals 104 105 * replace _Externally Manage PF_ feature 106 * remove all shell scripts from the code 107 108 ## Proposal 109 110 1. Drop existing bash-based implementation which is used for NICs with _**switchdev**_ configuration 111 2. Modify _generic_ and _virtual (if required)_ plugins to support _**switchdev**_ configuration 112 3. Modify code related to the _Externally Manage PF_ feature 113 to support _**switchdev**_ configuration 114 4. Modify `systemd` mode flow to handle devices with both _**legacy**_ and _**switchdev**_ configurations 115 5. Split `systemd` mode system service to two parts: 116 - `pre` - executes before NetworkManager/netplan and OVS 117 - `after` - executes after NetworkManager/netplan and OVS 118 119 ```mermaid 120 --- 121 title: Proposed logic 122 --- 123 flowchart TD; 124 operatorMode["SriovOperatorConfig.configurationMode"]; 125 inDaemon["apply configurations(legacy and switchdev) in 126 sriov-network-operator-daemon"] 127 systemdSystemd["create `pre` and `after` systemd services(go code) 128 to handle configurations(legacy and switchdev) on boot"] 129 130 operatorMode-- daemon --> inDaemon; 131 operatorMode-- systemd --> systemdSystemd; 132 133 ``` 134 135 136 ### Workflow Description 137 138 Users using only NICs with _**legacy**_ SRIOV configurations will not need to change their workflow. 139 The operator should behave for these configurations the same way as it does now. 140 141 Users using NICs with _**switchdev**_ configurations will need to explicitly set operator's 142 `configurationMode` to `systemd` if they expect the configuration of the NIC to happen 143 early on boot (before Kubernetes starts) to support the hwoffload use-case. 144 145 ### API Extensions 146 147 #### SriovNetworkNodeState CR 148 149 `SriovNetworkNodeState.status.interfaces[*].Vfs[*].vdpaType` field should be added. 150 151 This field should be used to report information about type of the VDPA 152 device that is configured for VF. 153 Empty string means that there is no VDPA device. 154 155 Valid values are: `virtio`, `vhost` (same as in `SriovNetworkNodePolicySpec`) 156 157 ``` 158 type VirtualFunction struct { 159 Name string `json:"name,omitempty"` 160 Mac string `json:"mac,omitempty"` 161 Assigned string `json:"assigned,omitempty"` 162 Driver string `json:"driver,omitempty"` 163 PciAddress string `json:"pciAddress"` 164 Vendor string `json:"vendor,omitempty"` 165 DeviceID string `json:"deviceID,omitempty"` 166 Vlan int `json:"Vlan,omitempty"` 167 Mtu int `json:"mtu,omitempty"` 168 VfID int `json:"vfID"` 169 + VdpaType string `json:"vdpaType,omitempty"` 170 } 171 ``` 172 173 #### SriovOperatorConfig CR 174 175 Change in the operator's behavior: `configurationMode` option now have effect 176 on NICs with _**switchdev**_ configurations. 177 178 ### Implementation Details/Notes/Constraints 179 180 We should consider improving unit-test coverage for modified code parts during the implementation. 181 182 After the operator upgrade, we should clean up from the host unneeded files (scripts, system services, config files) created by the previous version of the operator. 183 184 ### Upgrade & Downgrade considerations 185 186 * after upgrading the operator, _**switchdev**_ config will be applied by **sriov-network-operator-daemon** and not by systemd service unless the user changes `configurationMode` setting to `systemd` 187 * after upgrading the operator, "implicit mixed mode" when _**switchdev**_ NIC configurations are handled by bash scripts(in systemd services) 188 and _**legacy**_ NIC configurations are managed by **sriov-network-operator-daemon** will not be supported anymore. 189 190 _Note: `configurationMode` is a global setting, so the user will need to decide 191 which mode to use for the entire cluster_ 192 193 Upgrade/Downgrade for users using only NICs with _**legacy**_ configurations will not require any actions. 194 Upgrade/Downgrade for clusters with _**switchdev**_ configurations will require 195 changing the operator's `configurationMode` option. 196 197 ### Test Plan 198 199 The proposed changes will not introduce new functionality. 200 201 After the refactoring, _**switchdev**_ configurations will also be supported in the `daemon` mode. 202 This is the only thing we may need to develop additional tests for. 203 All other changes should be validated by running regression testing. 204 205 _Note: behavior for _**switchdev**_ configurations will be changed in a non-fully compatible way; 206 this may require to fix existing tests._