sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20200804-windows-support.md (about)

     1  ---
     2  title: Windows kubeadm-based worker nodes support
     3  authors:
     4    - "@jsturtevant"
     5    - "@ksubrmnn"
     6  reviewers:
     7    - "@CecileRobertMichon"
     8    - "@ncdc"
     9    - "@randomvariable"
    10  creation-date: 2020-08-25
    11  last-updated: 2020-09-09
    12  status: implementable
    13  see-also:
    14  ---
    15  
    16  # Windows kubeadm-based worker nodes support
    17  
    18  ## Table of Contents
    19  
    20  <!-- START doctoc generated TOC please keep comment here to allow auto update -->
    21  <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
    22  
    23  - [Glossary](#glossary)
    24  - [Summary](#summary)
    25  - [Motivation](#motivation)
    26    - [Goals](#goals)
    27    - [Non-Goals/Future Work](#non-goalsfuture-work)
    28  - [Proposal](#proposal)
    29    - [Cluster API Bootstrap Provider Kubeadm](#cluster-api-bootstrap-provider-kubeadm)
    30      - [cloud-init and cloudbase-init](#cloud-init-and-cloudbase-init)
    31      - [Image Creation](#image-creation)
    32      - [Kubelet and other component configuration](#kubelet-and-other-component-configuration)
    33      - [netbios names](#netbios-names)
    34    - [Infrastructure provider implementation](#infrastructure-provider-implementation)
    35    - [User Stories](#user-stories)
    36      - [As an operator, I would like to create Windows OS worker nodes with the CAPI API.](#as-an-operator-i-would-like-to-create-windows-os-worker-nodes-with-the-capi-api)
    37      - [As an operator, I would like to manage Windows OS worker nodes with the CAPI API.](#as-an-operator-i-would-like-to-manage-windows-os-worker-nodes-with-the-capi-api)
    38    - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    39      - [Signing of the components.](#signing-of-the-components)
    40      - [Known prototypes and prior work:](#known-prototypes-and-prior-work)
    41    - [Security Model](#security-model)
    42    - [Risks and Mitigations](#risks-and-mitigations)
    43  - [Alternatives](#alternatives)
    44  - [Upgrade Strategy](#upgrade-strategy)
    45  - [Additional Details](#additional-details)
    46    - [Test Plan [optional]](#test-plan-optional)
    47    - [Graduation Criteria [optional]](#graduation-criteria-optional)
    48      - [Alpha](#alpha)
    49      - [Beta](#beta)
    50      - [Stable](#stable)
    51    - [Version Skew Strategy](#version-skew-strategy)
    52  - [Implementation History](#implementation-history)
    53  
    54  <!-- END doctoc generated TOC please keep comment here to allow auto update -->
    55  
    56  ## Glossary
    57  
    58  Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
    59  
    60  ## Summary
    61  
    62  This proposal is for the support of Windows [OS](https://cluster-api.sigs.k8s.io/reference/glossary.html#operating-system) worker nodes in Cluster API and [infrastructure providers](https://cluster-api.sigs.k8s.io/reference/glossary.html#infrastructure-provider) that wish to support 
    63  Windows. Cluster API will support Windows by using kubeadm to add Windows nodes to a [workload cluster](https://cluster-api.sigs.k8s.io/reference/glossary.html#workload-cluster). 
    64  
    65  Windows support has been stable in Kubernetes since 1.14 and is supported in clusters that run Linux for the
    66  Control Plane.  The Worker nodes can be any combination of Windows or Linux. 
    67  
    68  Windows node support has some unique challenges because of the current limitations of Windows Containers. 
    69  Windows containers do not support privileged operations which means that configuration and access to the host
    70  machine must be done at provisioning time.  
    71  
    72  An example of this limitation is how kube-proxy gets configured on Windows nodes. Kube-proxy typically runs as a 
    73  Windows service on the host machine and it cannot be deployed as a DaemonSet as it is on Linux.  
    74  To address this limitation the community has built tools such as the [CSI-Proxy](https://github.com/kubernetes-csi/csi-proxy), which is a CSI driver specific proxy. 
    75  This proposal will address how to approach the configuration of components that are typically deployed as 
    76  daemon sets when bootstrapping Windows nodes in CAPI.
    77  
    78  ## Motivation
    79  
    80  Kubernetes has supported Windows workloads since the release of Windows support in Kubernetes 1.14. The 
    81  motivation of this proposal is to enable Cluster API users to deploy Windows as part of a mixed OS cluster 
    82  via the Cluster API automation built for platform operators.  This will enable cluster operators to define 
    83  Windows machines in the same consistent and repeatable fashion.
    84  
    85  ### Goals
    86  
    87  - Enable the creation and management of Windows worker nodes on workload clusters by adding support via the Kubeadm bootstrap provider and infrastructure providers
    88  - Provide community guidance and scripts for building base images for Windows nodes
    89  - Re-use of the existing Cluster API Bootstrap Provider Kubeadm and other tools where appropriate
    90  
    91  ### Non-Goals/Future Work
    92  
    93  - Provide a way to run [control plane](https://cluster-api.sigs.k8s.io/reference/glossary.html#control-plane) nodes as Windows
    94  - Support for Windows versions outside of the Kubernetes support versions
    95  - Support for Windows nodes on the [management](https://cluster-api.sigs.k8s.io/reference/glossary.html#management-cluster) or [bootstrap clusters](https://cluster-api.sigs.k8s.io/reference/glossary.html#bootstrap-cluster)
    96  - Provide a way to configure Windows nodes with non-Kubeadm based bootstrap providers
    97  
    98  ## Proposal
    99  
   100  ### Cluster API Bootstrap Provider Kubeadm
   101  
   102  #### cloud-init and cloudbase-init
   103  
   104  For Linux, when using the Kubeadm bootstrap provider, the bootstrap script is provided to the infrastructure provider as a cloud-init script. 
   105  The infrastructure provider is responsible for putting the cloud-init script in the right location. 
   106  When the VM is booted, the cloud-init script runs automatically. 
   107  
   108  Cloud-init does not have Windows support. An alternative product is [cloudbase-init](https://github.com/cloudbase/cloudbase-init). 
   109  Cloudbase-init functions in the same way as cloud-init and can consume cloud-init scripts as provided by the Cluster API Bootstrap Provider Kubeadm.  
   110  By using cloudbase-init, Windows can leverage the existing solutions and stay up to date with the latest changes in CABPK.  Refer to the [cloudbase-init documentation](https://cloudbase-init.readthedocs.io/en/latest/intro.html) for features that are supported.
   111  
   112  #### Image Creation
   113  
   114  Using cloudbase-init requires the creation of an image with the tooling installed on it since it is not 
   115  provided out of the box by any cloud providers.  We'll provide packer scripts as part of 
   116  the [image-builder project](https://github.com/kubernetes-sigs/image-builder) that pre-installs 
   117  `cloudbase-init`.  It is important to note that while scripts can be provided to build an image, all images 
   118  built need to adhere to [Windows licensing requirements](https://learn.microsoft.com/windows-server/windows-server-licensing/windows-server-licensing).
   119  
   120  There is prior art for building Windows base images. For example, AKS-Engine has an example implementation for using packer and scripts to do image configuration: https://github.com/Azure/aks-engine/blob/master/vhd/packer/windows-vhd-builder.json.  
   121  Another example is the [sig-windows-tools](https://github.com/kubernetes-sigs/sig-windows-tools) which provide scripts for image configuration when using Kubeadm.
   122  
   123  Although the Linux implementation in image-builder uses Ansible for configuration, Windows isn't going to share
   124  the same configuration because [Ansible](https://docs.ansible.com/ansible/latest/user_guide/windows.html) requires [Windows specific modules](https://docs.ansible.com/ansible/2.9/modules/list_of_windows_modules.html) to do the configuration. 
   125  
   126  #### Kubelet and other component configuration
   127  
   128  Due to the lack of privileged containers in Windows, a combination of `PreKubeadmCommands`/`PostKubeadmCommands` 
   129  scripts and wins.exe can be used to configure the nodes. Wins.exe is currently provided as a way to bootstrap nodes along with kubeadm in the [Kubernetes documentation for adding Windows nodes](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/).  
   130  The components from the [preparenode script](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/#joining-a-windows-worker-node) can be used during image creation.
   131  
   132  In the future, when support for [Privileged Containers for Windows containers](https://github.com/kubernetes/enhancements/issues/1981) is merged, we might be able to revisit this proposal 
   133  and use privileged containers in place of wins.exe enabled containers. 
   134  
   135  Each infrastructure providers must provide their own `PreKubeadmCommands`/`PostKubeadmCommands` scripts that
   136  are required for additional configuration for the node. During planning for Beta we will be able to identify
   137  common overlapping features that can be added into the base images in image-builder and for re-use 
   138  
   139  #### netbios names
   140  
   141  Cluster API currently generates the machine deployment name which can result in long machine names.  This was a was concern for Netbios on Windows which requires Windows computer names to be 15 characters or fewer (https://support.microsoft.com/en-us/help/909264/naming-conventions-in-active-directory-for-computers-domains-sites-and). 
   142  Attempting to set a hostname with more than 15 characters on a windows machine will result in only the first 15 being used.
   143  
   144  The conclusion of the [issue](https://github.com/kubernetes-sigs/cluster-api/issues/2217) was NETBIOS name resolution is mostly unused today and is not required to join an AD domain since Windows 2000. If DNS is properly configured then the long host names generated by Cluster API will be usable.  
   145  
   146  ### Infrastructure provider implementation
   147  
   148  By leveraging cloudbase-init, an infrastructure provider implementation will require only a few changes which include:
   149  
   150  - Make changes to their provider api to enable Windows OS infra machines ([example](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1036/commits/e753a32fccdf6b825b606f12bb1acd6e34a70339#diff-492149096931cdfde8cf61676230879e41cfdb1afeb74784cdf85bca2272a1be))
   151  - Ensuring cloudbase-init is configured properly to read UserData which will contain the cloud-init script.  Users must configure 
   152  [cloudbase-init with a metadata service](https://cloudbase-init.readthedocs.io/en/latest/services.html#configuring-available-services) that has support for [UserData](https://cloudbase-init.readthedocs.io/en/latest/userdata.html) ([example](https://cloudbase-init.readthedocs.io/en/latest/tutorial.html#configuration-file)).
   153  
   154  From the infrastructure provider perspective, there are no known required changes to the CAPI API to support Windows
   155  nodes at this time. If during alpha we identify changes we will open issues pertaining to the changes required.
   156  
   157  ### User Stories
   158  
   159  #### As an operator, I would like to create Windows OS worker nodes with the CAPI API.
   160  
   161  #### As an operator, I would like to manage Windows OS worker nodes with the CAPI API.
   162  
   163  ### Implementation Details/Notes/Constraints
   164  
   165  Due to the lack of privileged containers in Windows there are two options for configuring the components such as 
   166  kube-proxy, kubelet.  The above solution using wins is preferred because it makes the move to privileged containers 
   167  straightforward as a drop and replace. 
   168  
   169  While this is the best choice for the alpha and the community direction there are some infrastructure providers that may 
   170  not be able to use wins due to signing or security concerns since wins allows the execution of any arbitrary command on 
   171  the host. Pre/post commands can be used as an alternative with additional scripts cached on the image that enable the configuration.
   172  
   173  #### Signing of the components.  
   174  
   175  Some infrastructure providers will require any scripts and binaries are signed before deployment.  
   176  This will be managed by providing the ability to provide url's to override external scripts and binaries 
   177  during the image building process. An example of how this is could be accomplished is in the Linux 
   178  implementation is the [containerd_url](https://github.com/kubernetes-sigs/image-builder/blob/58a08a1a8241356bab4afb1c6d8d2fbb8ef54bcf/images/capi/packer/config/ansible-args.json).  In this case, the 
   179  `containerd_url` could point to a location that would contain a packaged with signed binaries from the infrastructure provider.
   180  
   181  #### Known prototypes and prior work: 
   182  
   183  - https://github.com/adelina-t/cloudbase-init-capz-demo
   184  - https://github.com/benmoss/kubeadm-windows/tree/master/cluster-api-aws
   185  - https://github.com/microsoft/cluster-api-provider-azurestackhci
   186  
   187  ### Security Model
   188  
   189  Wins.exe is currently the [recommended way to use kubeadm](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/).  
   190  Limiting access to the named pipes that are required is one way to mitigate access.  Wins is currently 
   191  required during and after provisioning for running kube-proxy and the CNI daemonset.  
   192  
   193  The security model for Privileged containers is still being discussed in KEP and is still early.  The security concerns for Privileged containers will be addresses in the Beta phase of this proposal after the Privileged containers KEP progresses.
   194  
   195  Kubeadm bootstrap token should be able to use multi-part mime documents for cloudbase-init as done for [Linux in CAPA](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/28d01d064cc2e5b0286ae23b3be7203f18b00447/controllers/awsmachine_controller.go#L601). 
   196  This will require an update to Cloudbase-init which does support [mutli-part mime documents](https://cloudbase-init.readthedocs.io/en/latest/userdata.html#multi-part-content) but is missing the [boothook](https://cloudinit.readthedocs.io/en/latest/topics/format.html?highlight=boothook#cloud-boothook) functionality.  
   197  Support for cloudhooks can be added to cloudbase-init to meet the AWS provider requirement that 
   198  would bring parity to Linux implementation during the Beta phase.
   199  
   200  There is [no known requirement](https://github.com/kubernetes-sigs/cluster-api/issues/2218) for managing the Admin
   201  Kubeconfig or domain passwords in the Windows configuration. Domain passwords should be managed outside the scope of 
   202  CAPI and only Kubeadm bootstrap tokens, which have limited lifetime, should be used for the joining the Windows nodes to the cluster. 
   203  The joining of Windows nodes to a Domain Controller can be accomplished through pre/post kubeadm commands.  Future support could be 
   204  added via a separate controller that supports composable bootstrapping which is outside of the scope of this CAEP.  Refer 
   205  to issue [#3761](https://github.com/kubernetes-sigs/cluster-api/issues/3761) for more details.
   206  
   207  ### Risks and Mitigations
   208  
   209  - Privileged containers are not implemented.
   210    - There is an active discussion and [KEP](https://docs.google.com/document/d/12EUtMdWFxhTCfFrqhlBGWV70MkZZPOgxw0X-LTR0VAo/edit#) in place.  At the Beta stage the community can do a checkpoint to determine if the solution fits user needs
   211  - Cloudbase-init is a third party dependency
   212    - This project is under Apache 2.0 License : https://github.com/cloudbase/cloudbase-init which is cleared under the CNCF Allow list: https://github.com/cncf/foundation/blob/master/allowed-third-party-license-policy.md
   213  - Windows image Distribution
   214    - Infrastructure providers can provide the ability to use user provided images and images provided by image-promoter are recommended for testing and demonstration purposes. It is recommended the user creates their own image. 
   215    - Users using the image scripts must ensure they are following [Windows licensing requirements](https://learn.microsoft.com/windows-server/windows-server-licensing/windows-server-licensing)
   216  - Wins.exe is a third party dependency
   217    - The project is under the Apache 2.0 License
   218  
   219  ## Alternatives
   220  
   221  1. An alternative to using wins.exe and DaemonSets to do the configuration to download and configure components as services 
   222     using the kubeadm pre/post commands. This would require the infrastructure providers to have the ability to pass 
   223     configuration through the use of these commands which is already done today. During the Alpha phase with the pre/post 
   224     scripts being developed by individual infra providers this will not be an issue.  With the move to Windows privileged
   225     containers in Beta, this becomes a non issue as wins will no longer be required.
   226  
   227  1. Create a separate bootstrap provider for Windows.  This would require re-implementing a lot of the logic that is 
   228     already in CABPK.  
   229     When bugs are fixed or changes in behavior occur, Windows would risk being out of sync with the Linux implementation.
   230  
   231  1. Modified CABPK provider to have a different output format than cloud-init for Windows nodes.   With Cloudbase-init 
   232     there are no requirements to change CABPK which makes the adaption for Windows straightforward. If we were to adapt 
   233     the output format of CABPK there is a potential for introduction of bugs and variation in the logic that would be 
   234     created for Windows nodes.  This would cause the Windows implementation to differ from others which could lead to 
   235     confusion when debugging differences.
   236  
   237  ## Upgrade Strategy
   238  
   239  Nodes that use this pattern will require the infrastructure to be immutable as specified by CAPI documentation.
   240  
   241  ## Additional Details
   242  
   243  ### Test Plan [optional]
   244  
   245  **Note:** *Section not required until targeted at a release.*
   246  
   247  There are no changes to CAPI core proposed at this time. End to end tests in CAPI test suit use [Kind](https://kind.sigs.k8s.io/) and
   248  Windows does not support Docker in Docker so end to end test can not be added for Windows specific behavior.
   249  If changes are required during development unit tests will be required.  
   250  
   251  For infrastructure providers the testing plan is left up to each infrastructure provider. It is recommended to leverage 
   252  the existing upstream Kubernetes Windows tests to show that Windows nodes are operating effectively. 
   253  
   254  ### Graduation Criteria [optional]
   255  
   256  **Note:** *Section not required until targeted at a release.*
   257  
   258  #### Alpha
   259  - `PreKubeadmCommands`/`PostKubeadmCommands` command scripts are created per infrastructure provider if required
   260  - Initially implemented with wins.exe
   261  - Windows Packer scripts to create image with cloudbase-init added to the image builder scripts
   262  
   263  #### Beta
   264  - Pre/Post commands moved to bootstrap provider if identified as re-usable
   265  - Adopt privileged containers (dependant on Privileged containers KEP)
   266  - kubeadm bootstrap token can be kept secret via multi-part mime documents for cloudbase-init. 
   267  
   268  #### Stable
   269  Use of privileged containers. 
   270  
   271  ### Version Skew Strategy 
   272  
   273  The version of support for the Windows operating system is outside the scope of cluster creation.  Please refer 
   274  to the Kubernetes Windows documentation for the latest version skew support, features, and functionality.
   275  
   276  ## Implementation History
   277  
   278  - [X] 01/29/2020: Proposed idea in an issue
   279    - https://github.com/kubernetes-sigs/cluster-api/issues/2218
   280    - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/153
   281  - [X] 08/17/2020: Compile a Google Doc following the CAEP template 
   282    - https://docs.google.com/document/d/14evDl_3RgEFfchmgPzNw6lb1vIttN_Hb9333UnUJ734/edit
   283  - [X] 08/31/2020: First round of feedback from community
   284  - [X] 08/25/2020: Present proposal at a [community meeting]
   285  - [X] 09/09/2020: Open proposal PR
   286  
   287  <!-- Links -->
   288  [community meeting]: https://docs.google.com/document/d/1fQNlqsDkvEggWFi51GVxOglL2P1Bvo2JhZlMhm2d-Co/edit#heading=h.ozawn3ogj91o
   289