github.com/k8snetworkplumbingwg/sriov-network-operator@v1.2.1-0.20240408194816-2d2e5a45d453/doc/design/switchdev-refactoring.md (about)

     1  ---
     2  title: switchdev mode refactoring
     3  authors:
     4    - ykulazhenkov
     5  reviewers:
     6  creation-date: 14-12-2023
     7  last-updated: 14-12-2023
     8  ---
     9  
    10  # switchdev and systemd modes refactoring
    11  
    12  ## Summary
    13  
    14  We need to refactor the implementation used for NICs in switchdev mode and align its behavior with the systemd
    15  mode of the operator. The refactoring is required to simplify the development of the new switchdev-related
    16  features for the sriov-network-operator.
    17  
    18  ## Motivation
    19  
    20  Currently the **sriov-network-operator** supports two configuration modes:
    21  * `daemon`
    22  * `systemd`
    23  
    24  The configuration mode can be changed by setting the `configurationMode` field in the `SriovOperatorConfig` CR.
    25  
    26  _**Note**: This setting is global and applies to all sriov-network-operator-daemons in the cluster._
    27  
    28  In the `daemon` mode, which historically is the first implemented mode,
    29  the operator will setup NICs with _**SRIOV legacy**_ configuration directly in the **sriov-network-operator-daemon**
    30  component by executing all enabled plugins.
    31  
    32  When the operator is in `systemd` mode, the **sriov-network-operator-daemon** component will execute most plugins
    33  in the same way as in the `daemon` mode but will skip the call of the *generic* or *virtual*
    34  plugin(when running in a virtualized environment) and instead will render config for the systemd
    35  service that starts on the next OS boot and calls one of these plugins.
    36  Then, the result of the service execution is handled by the **sriov-network-operator-daemon**.
    37  
    38  The `systemd` mode was implemented to support scenarios when, after the host reboot,
    39  we need VFs to be configured before Kubernetes (kubelet) and Pods with workloads are started.
    40  
    41  To setup NICs with _**switchdev**_ configuration, the operator uses a different flow that ignores
    42  the `configurationMode` setting. Two systemd services(not the same as used for systemd mode) are created on the node.
    43  The first service is executed before the NetworkManager starts, and the second one after.
    44  Both services run bash scripts. The script from the first service is responsible for VFs creation and for
    45  switching a NIC to the switchdev eSwitch mode. The script from the second service binds VFs to the required drivers.
    46  
    47  If a NIC has _**switchdev**_ configuration, then `configurationMode` of the operator does not affect it.
    48  
    49  ```mermaid
    50  ---
    51  title: Current logic
    52  ---
    53  flowchart TD;
    54      operatorMode["SriovOperatorConfig.configurationMode"];
    55      nicMode["NIC's eSwitch mode"];
    56      inDaemon["apply configuration(legacy) in
    57       sriov-network-operator-daemon"]
    58      bashSystemd["create two switchdev services(bash scripts)
    59       to handle configuration(switchdev) on boot"];
    60      systemdSystemd["create single systemd service(go code)
    61      to handle configuration(legacy) on boot"]
    62  
    63      nicMode-- switchdev --> bashSystemd;
    64      nicMode-- legacy --> operatorMode;
    65      operatorMode-- daemon --> inDaemon;
    66      operatorMode-- systemd --> systemdSystemd;
    67  
    68  ```
    69  
    70  #### Problems of the current implementation
    71  
    72  * it is confusing that `configurationMode` does not affect devices with switchdev configuration.
    73  * system services for switchdev configuration are shell scripts completely independent from the
    74  main code base and it is hard to extend them with new functionality.
    75  * for switchdev NICs,  VF configuration flow (bash-based) has some limitations compared to legacy VF configuration.
    76  * it is impossible to apply switchdev configuration for the NIC without reboot.
    77  
    78  ### Use Cases
    79  
    80  * As a developer I don't want to maintain the code with similar logic in two places:
    81  switchdev bash scrips and **sriov-network-operator-daemon** code.
    82  * As a developer and a user I want to have only one set of systemd services that handle
    83  both _**switchdev**_  configurations and `systemd` mode.
    84  * As a user I want `configurationMode` to work the same way for NICs with
    85  _**legacy**_ and _**switchdev**_ configurations.
    86  * As a user I want to apply _**switchdev**_ configuration for NIC by the **sriov-network-operator-daemon**
    87  without reboot (in case if reboot is not required by other logic,
    88  e.g. kernel parameters configuration, FW configuration).
    89  
    90  ### Goals
    91  
    92  * it should be possible to apply  _**switchdev**_ configuration in the **sriov-network-operator-daemon** without reboot.
    93  * the code used by `daemon` and `systemd` modes to handle _**switchdev**_ and _**legacy**_
    94  configurations should be the same Golang code.
    95  * `configurationMode` option should work the same for NICs with _**legacy**_ and _**switchdev**_ configurations.
    96  * the operator should use unified systemd services which will be deployed only
    97  if the operator works in the `systemd` mode.
    98  * `systemd` mode should be changed to support 2 stage configuration: 
    99  pre system network manager (NetworkManager or netplan) and after system network manager.
   100    _Note: This is required to support all use-cases supported by the current switchdev implementation._
   101  
   102  
   103  ### Non-Goals
   104  
   105  * replace _Externally Manage PF_ feature
   106  * remove all shell scripts from the code
   107  
   108  ## Proposal
   109  
   110  1. Drop existing bash-based implementation which is used for NICs with _**switchdev**_ configuration
   111  2. Modify _generic_ and _virtual (if required)_ plugins to support _**switchdev**_ configuration
   112  3. Modify code related to the _Externally Manage PF_ feature 
   113  to support _**switchdev**_ configuration
   114  4. Modify `systemd` mode flow to handle devices with both  _**legacy**_  and _**switchdev**_  configurations
   115  5. Split `systemd` mode system service to two parts:
   116      - `pre` - executes before NetworkManager/netplan and OVS
   117      - `after` - executes after NetworkManager/netplan and OVS
   118  
   119  ```mermaid
   120  ---
   121  title: Proposed logic
   122  ---
   123  flowchart TD;
   124      operatorMode["SriovOperatorConfig.configurationMode"];
   125      inDaemon["apply configurations(legacy and switchdev) in
   126       sriov-network-operator-daemon"]
   127      systemdSystemd["create `pre` and `after` systemd services(go code)
   128      to handle configurations(legacy and switchdev) on boot"]
   129  
   130      operatorMode-- daemon --> inDaemon;
   131      operatorMode-- systemd --> systemdSystemd;
   132  
   133  ```
   134  
   135  
   136  ### Workflow Description
   137  
   138  Users using only NICs with _**legacy**_ SRIOV configurations will not need to change their workflow.
   139  The operator should behave for these configurations the same way as it does now.
   140  
   141  Users using NICs with _**switchdev**_ configurations will need to explicitly set operator's 
   142  `configurationMode` to `systemd` if they expect the configuration of the NIC to happen
   143  early on boot (before Kubernetes starts) to support the hwoffload use-case.
   144  
   145  ### API Extensions
   146  
   147  #### SriovNetworkNodeState CR
   148  
   149  `SriovNetworkNodeState.status.interfaces[*].Vfs[*].vdpaType` field should be added.
   150  
   151  This field should be used to report information about type of the VDPA 
   152  device that is configured for VF.
   153  Empty string means that there is no VDPA device. 
   154  
   155  Valid values are: `virtio`, `vhost` (same as in `SriovNetworkNodePolicySpec`)
   156  
   157  ```
   158  type VirtualFunction struct {
   159  	Name       string `json:"name,omitempty"`
   160  	Mac        string `json:"mac,omitempty"`
   161  	Assigned   string `json:"assigned,omitempty"`
   162  	Driver     string `json:"driver,omitempty"`
   163  	PciAddress string `json:"pciAddress"`
   164  	Vendor     string `json:"vendor,omitempty"`
   165  	DeviceID   string `json:"deviceID,omitempty"`
   166  	Vlan       int    `json:"Vlan,omitempty"`
   167  	Mtu        int    `json:"mtu,omitempty"`
   168  	VfID       int    `json:"vfID"`
   169  + 	VdpaType   string `json:"vdpaType,omitempty"`
   170  }
   171  ```
   172  
   173  #### SriovOperatorConfig CR
   174  
   175  Change in the operator's behavior: `configurationMode` option now have effect
   176  on NICs with _**switchdev**_ configurations.
   177  
   178  ### Implementation Details/Notes/Constraints
   179  
   180  We should consider improving unit-test coverage for modified code parts during the implementation.
   181  
   182  After the operator upgrade, we should clean up from the host unneeded files (scripts, system services, config files) created by the previous version of the operator. 
   183  
   184  ### Upgrade & Downgrade considerations
   185  
   186  * after upgrading the operator, _**switchdev**_ config will be applied by **sriov-network-operator-daemon** and not by systemd service unless the user changes `configurationMode` setting to `systemd`
   187  * after upgrading the operator, "implicit mixed mode" when _**switchdev**_ NIC configurations are handled by bash scripts(in systemd services)
   188  and _**legacy**_ NIC configurations are managed by **sriov-network-operator-daemon** will not be supported anymore.
   189  
   190  _Note: `configurationMode` is a global setting, so the user will need to decide 
   191  which mode to use for the entire cluster_
   192  
   193  Upgrade/Downgrade for users using only NICs with _**legacy**_  configurations will not require any actions.
   194  Upgrade/Downgrade for clusters with _**switchdev**_ configurations will require
   195  changing the operator's `configurationMode` option.
   196  
   197  ### Test Plan
   198  
   199  The proposed changes will not introduce new functionality.
   200  
   201  After the refactoring, _**switchdev**_ configurations will also be supported in the `daemon` mode.
   202  This is the only thing we may need to develop additional tests for.
   203  All other changes should be validated by running regression testing.
   204  
   205  _Note: behavior for _**switchdev**_ configurations will be changed in a non-fully compatible way;
   206  this may require to fix existing tests._