sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/proposals/20200720-single-controller-multitenancy.md (about)

     1  ---
     2  title: Single Controller Multitenancy
     3  authors:
     4    - "@devigned"
     5  reviewers:
     6    - "@nader-ziada"
     7    - "@CecileRobertMichon"
     8  creation-date: 2020-07-20
     9  last-updated: 2020-07-20
    10  status: implementable
    11  see-also:
    12  - https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1674
    13  - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/586
    14  - https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/977
    15  replaces: []
    16  superseded-by: []
    17  ---
    18  
    19  # Single Controller Multitenancy
    20  
    21  ## Table of Contents
    22  
    23  - [Single Controller Multitenancy](#single-controller-multitenancy)
    24    - [Table of Contents](#table-of-contents)
    25    - [Glossary](#glossary)
    26    - [Summary](#summary)
    27    - [Motivation](#motivation)
    28      - [Goals](#goals)
    29    - [Proposal](#proposal)
    30      - [User Stories](#user-stories)
    31        - [Story 1](#story-1---locked-down-with-service-principal-per-subscription)
    32        - [Story 2](#story-2---locked-down-by-namespace-and-subscription)
    33        - [Story 3](#story-3---using-an-azure-user-assigned-identity)
    34        - [Story 4](#story-4---legacy-behavior-preserved)
    35        - [Story 5](#story-5---software-as-a-service-provider)
    36    - [Requirements](#requirements)
    37      - [Functional](#functional)
    38      - [Non-Functional](#non-functional)
    39      - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
    40        - [Proposed Changes](#proposed-changes)
    41          - [AAD Pod Identity for Attaching Managed Identities](#aad-pod-identity-for-attaching-managed-identities)
    42          - [Cluster API Provider Azure v1alpha3 types](#cluster-api-provider-azure-v1alpha3-types)
    43      - [Controller Changes](#controller-changes)
    44      - [Clusterctl changes](#clusterctl-changes)
    45        - [Validating webhook changes](#validating-webhook-changes)
    46        - [Principal Type Credential Provider Behaviour](#principal-type-credential-provider-behaviour)
    47      - [Security Model](#security-model)
    48        - [Roles](#roles)
    49      - [RBAC](#rbac)
    50          - [Write Permissions](#write-permissions)
    51      - [Namespace Restrictions](#namespace-restrictions)
    52      - [CAPZ Controller Requirements](#capz-controller-requirements)
    53      - [Risks and Mitigations](#risks-and-mitigations)
    54        - [Caching and handling refresh of credentials](#caching-and-handling-refresh-of-credentials)
    55    - [Upgrade Strategy](#upgrade-strategy)
    56    - [Additional Details](#additional-details)
    57      - [Test Plan](#test-plan)
    58      - [Graduation Criteria](#graduation-criteria)
    59        - [Alpha](#alpha)
    60        - [Beta](#beta)
    61        - [Stable](#stable)
    62    - [Implementation History](#implementation-history)
    63  
    64  ## Glossary
    65  
    66  * Principal Type - One of several ways to provide a form of identity that is ultimately resolved to 
    67  an Azure Active Directory (AAD) Principal.
    68  * Authorizer - An implementation of the [Azure SDK for Golang's Authorizer Interface](https://github.com/Azure/go-autorest/blob/7ac73d3561eaa034f458f97362b2743e8b3c048e/autorest/authorization.go#L42).
    69  * CAPZ - An abbreviation of Cluster API Provider Azure.
    70  
    71  ## Summary
    72  
    73  The CAPZ operator is able to manage Azure cloud infrastructure within the permission scope of the 
    74  AAD principal it is initialized with, usually through environment vars. The CAPZ operator will be 
    75  provided credentials via the deployment, either explicitly via environment variables or implicitly 
    76  via the default SDK credential provider chain, including Azure instance metadata service via User Assigned Identities.
    77  
    78  In addition, CAPZ uses the environmentally configured identity for the lifetime of the deployment. 
    79  This also means that an AzureCluster could be broken if the instance of CAPZ that created it is 
    80  misconfigured for another set of credentials.
    81  
    82  This proposal outlines new capabilities for CAPZ to assume a different AAD Principal, at 
    83  runtime, on a per-cluster basis. The proposed changes would be fully backwards compatible and maintain
    84  the existing behavior with no changes to user configuration required.
    85  
    86  ## Motivation
    87  
    88  For large organizations, especially highly-regulated organizations, there is a need to be able to 
    89  perform separate duties at various levels of infrastructure - permissions, networks and accounts. 
    90  Azure Role Based Authorization (RBAC) Controls provides a model which allows admins to provide 
    91  identities with the least privilege to perform activities. Within this model it is appropriate for 
    92  tooling running within the 'management' account to manage infrastructure within the 'workload' 
    93  accounts. Within the AAD model, the controller can assume the identity of the cluster to perform
    94  cluster management activities. For CAPZ to be most useful within these organizations it will need 
    95  to support multi-account models.
    96  
    97  Some organizations may also delegate the management of clusters to another third-party. In that 
    98  case, the boundary between organizations needs to be secured. In AAD, this can be accomplished by
    99  providing a third party AAD principal RBAC access to the Azure resources required to manage cluster
   100  infrastructure.
   101  
   102  Because a single deployment of the CAPZ operator may reconcile many clusters in its lifetime, it is 
   103  necessary to modify the CAPZ operator to scope its Azure `Authorizer` to within the reconciliation 
   104  process.
   105  
   106  Unlike AWS, Azure doesn't provide mechanisms for assuming roles across account boundaries, but rather
   107  allows RBAC rights to be enabled across account boundaries. This is the largest break in this proposal
   108  with regard to prior art of [the CAPA multitenancy proposal](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/8c0c3db8af44e3c3a1db772145d96154a3d36280/docs/proposal/20200506-single-controller-multitenancy.md)
   109  
   110  ### Goals
   111  
   112  1. To enable AzureCluster resources reconciliation using a cluster specified AAD Identity
   113  2. To maintain backwards compatibility and cause no impact for users who don't intend to make use of 
   114     this capability
   115  
   116  ## Proposal
   117  
   118  ### User Stories
   119  
   120  #### Story 1 - Locked down with Service Principal Per Subscription
   121  
   122  Alex is an engineer in a large organization which has a strict Azure account architecture. This 
   123  architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD
   124  identity having RBAC rights to provision the infrastructure only in the Subscription. The workload 
   125  clusters must run with a System Assigned machine identity. The organization has adopted Cluster API 
   126  in order to manage Kubernetes infrastructure, and expects 'management' clusters running the Cluster 
   127  API controllers to manage 'workload' clusters in dedicated Azure Subscriptions with an AAD account 
   128  which only has access to that Subscription.
   129  
   130  The current configuration exists:
   131  * Subscription for each cluster
   132  * AAD Service Principals with Subscription Owner rights for each Subscription
   133  * A management Kubernetes cluster running Cluster API Provider Azure controllers
   134  
   135  Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD 
   136  Service Principal by creating new Cluster API resources in the management cluster. Each of the
   137  workload cluster machines would run as the System Assigned identity described in the Cluster API
   138  resources. The CAPZ controller in the management cluster uses the Service Principal credentials when
   139  reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.
   140  
   141  #### Story 2 - Locked down by Namespace and Subscription
   142  
   143  Alex is an engineer in a large organization which has a strict Azure account architecture. This
   144  architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD
   145  identity having RBAC rights to provision the infrastructure only in the Subscription. The workload
   146  clusters must run with a System Assigned machine identity.
   147  
   148  Erin is a security engineer in the same company as Alex. Erin is responsible for provisioning
   149  identities. Erin will create a Service Principal for use by Alex to provision the infrastructure in
   150  Alex's cluster. The identity Erin creates should only be able to be used in a predetermined
   151  Kubernetes namespace where Alex will define the workload cluster. The identity should be able
   152  to be used by CAPZ to provision workload clusters in other namespaces.
   153  
   154  The organization has adopted Cluster API
   155  in order to manage Kubernetes infrastructure, and expects 'management' clusters running the Cluster
   156  API controllers to manage 'workload' clusters in dedicated Azure Subscriptions with an AAD account
   157  which only has access to that Subscription.
   158  
   159  The current configuration exists:
   160  * Subscription for each cluster
   161  * AAD Service Principals with Subscription Owner rights for each Subscription
   162  * A management Kubernetes cluster running Cluster API Provider Azure controllers
   163  
   164  Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD
   165  Service Principal by creating new Cluster API resources in the management cluster in the 
   166  predetermined namespace. Each of the workload cluster machines would run as the System Assigned 
   167  identity described in the Cluster API resources. The CAPZ controller in the management cluster 
   168  uses the Service Principal credentials when reconciling the AzureCluster so that it can 
   169  create/use/destroy resources in the workload cluster.
   170  
   171  Erin can provision an identity in a namespace of limited access and define the allowed namespaces,
   172  which will include the predetermined namespace for the workload cluster.
   173  
   174  #### Story 3 - Using an Azure User Assigned Identity
   175  
   176  Erin is an engineer working in a large organization. Erin does not want to be responsible for
   177  ensuring Service Principal secrets are rotated on a regular basis. Erin would like to use an
   178  Azure User Assigned Identity to provision workload cluster infrastructure. The User Assigned
   179  Identity will have the RBAC rights needed to provision the infrastructure in Erin's subscription.
   180  
   181  The current configuration exists:
   182  * Subscription for the workload cluster
   183  * A User Assigned Identity with RBAC with Subscription Owner rights for the Subscription
   184  * A management Kubernetes cluster running Cluster API Provider Azure controllers
   185  
   186  Erin can provision a new workload cluster in the specified Subscription with the Azure User
   187  Assigned Identity by creating new Cluster API resources in the management cluster. The CAPZ 
   188  controller in the management cluster uses the User Assigned Identity credentials when reconciling 
   189  the AzureCluster so that it can create/use/destroy resources in the workload cluster.
   190  
   191  #### Story 4 - Legacy Behavior Preserved
   192  
   193  Dascha is an engineer in a smaller, less strict organization with a few Azure accounts intended to 
   194  build all infrastructure. There is a single Azure Subscription named 'dev', and Dascha wants to 
   195  provision a new cluster in this Subscription. An existing Kubernetes cluster is already running the 
   196  Cluster API operators and managing resources in the dev Subscription. Dascha can provision a new 
   197  cluster by creating Cluster API resources in the existing cluster, omitting the ProvisionerIdentity 
   198  field in the AzureCluster spec. The CAPZ operator will use the Azure credentials provided in its 
   199  deployment template.
   200  
   201  #### Story 5 - Software as a Service Provider
   202  
   203  ACME Industries is offering Kubernetes as a service to other organizations. ACME creates an AAD 
   204  Identity for each organization and each organization grants that Identity access to provision 
   205  infrastructure in one or multiple of their Azure Subscriptions. ACME Industries wants to minimise 
   206  the memory footprint of managing many clusters, and wants to move to having a single instance of 
   207  CAPZ to managed infrastructure across multiple organizations.
   208  
   209  ## Requirements
   210  
   211  ### Functional
   212  
   213  <a name="FR1">FR1.</a> CAPZ MUST support assuming an identity specified by an `AzureCluster.ProvisioningIdentity`.
   214  
   215  <a name="FR2">FR2.</a> CAPZ MUST support static credentials.
   216  
   217  <a name="FR3">FR3.</a> CAPZ MUST prevent privilege escalation allowing users to create clusters in Azure accounts they should
   218    not be able to.
   219  
   220  <a name="FR4">FR4.</a> CAPZ SHOULD support credential refreshing modified principal data.
   221  
   222  <a name="FR5">FR5.</a> CAPZ SHOULD provide validation for principal data submitted by users.
   223  
   224  <a name="FR6">FR6.</a> CAPZ MUST support clusterctl move scenarios.
   225  
   226  ### Non-Functional
   227  
   228  <a name="NFR1">NFR1.</a> Each instance of CAPZ SHOULD be able to support 200 clusters using role assumption.
   229  
   230  <a name="NFR2">NFR2.</a> CAPZ MUST call AAD APIs only when necessary to prevent rate limiting.
   231  
   232  <a name="NFR3">NFR3.</a> Unit tests MUST exist for all credential provider code.
   233  
   234  <a name="NFR4">NFR4.</a> e2e tests SHOULD exist for all credential provider code.
   235  
   236  <a name="NFR5">NFR5.</a> Credential provider code COULD be audited by security engineers.
   237  
   238  ### Implementation Details/Notes/Constraints
   239  
   240  The current implementation of CAPZ requests new instances of Azure services per cluster and 
   241  sub-cluster resources. The input for these services is a ClusterScope, which provides the identity
   242  Azure service will operate as.
   243  
   244  ```go
   245  type ClusterScope struct {
   246  	logr.Logger
   247  	client      client.Client
   248  	patchHelper *patch.Helper
   249  
   250  	AzureClients
   251  	Cluster      *clusterv1.Cluster
   252  	AzureCluster *infrav1.AzureCluster
   253  }
   254  ```
   255  The ClusterScope contains the information needed to make an authenticated request against an
   256  Azure cloud. The `Authorizer` currently contains the AAD identity information loaded from the CAPZ
   257  controller environment. The `Authorizer` is used to fetch and refresh tokes for all Azure clients.
   258  
   259  ```go
   260  // AzureClients contains all the Azure clients used by the scopes.
   261  type AzureClients struct {
   262  	SubscriptionID             string
   263  	ResourceManagerEndpoint    string
   264  	ResourceManagerVMDNSSuffix string
   265  	Authorizer                 autorest.Authorizer
   266  }
   267  
   268  ```
   269  
   270  The signatures for the functions which create these instances are as follows:
   271  
   272  ```go
   273  func NewClusterScope(params ClusterScopeParams) (*ClusterScope, error) {
   274    ...
   275    return &ClusterScope{
   276      ...
   277    }, nil
   278  }
   279  ```
   280  
   281  #### Proposed Changes
   282  The proposed changes below only apply to infrastructure nodes which run the CAPZ controllers.
   283  These changes have no impact on the identities specified on AzureMachines or other workload 
   284  infrastructure.
   285  
   286  ##### AAD Pod Identity for Attaching Cluster Provisioning Identities
   287  [AAD Pod Identity](https://github.com/Azure/aad-pod-identity) enables Kubernetes applications to 
   288  access cloud resources securely using AAD Identities, Service Principal and User Assigned Identities. 
   289  To enable CAPZ to authenticate as a User Assigned Identity, CAPZ needs to have access to the Azure 
   290  IMDS service running on the host. This can be accomplished indirectly by using AAD Pod Identity.
   291  
   292  With AAD Pod Identity running within the management cluster CAPZ can create [AzureIdentityBindings](https://github.com/Azure/aad-pod-identity#5-deploy-azureidentitybinding)
   293  and other related structures to enable CAPZ to bind to Azure Identities.
   294  
   295  To use Azure Managed Identities, the infrastructure nodes hosting the CAPZ controller must be hosted
   296  in Azure. Outside of Azure, Service Principal identities are the only available identity type.
   297  
   298  ##### Cluster API Provider Azure v1alpha3 types
   299  
   300  <strong><em>Changed Resources</strong></em>
   301  * `AzureCluster`
   302  
   303  <strong><em>New Resources</strong></em>
   304  
   305  <em>Cluster scoped resources</em>
   306  
   307  * ` AzureClusterIdentity` represents the information needed to create an AzureIdentity and 
   308    AzureIdentityBinding [via Pod Identity](#aad-pod-identity-for-attaching-managed-identities). The type should
   309    also contain a string list representing the namespace which it is allowed to be used.
   310  
   311  <strong><em>Changes to AzureCluster</em></strong>
   312  
   313  A new field is added to the `AzureClusterSpec` to reference an `AzureClusterIdentity`. We intend to use 
   314  `corev1.LocalObjectReference` in order to ensure that the only objects that can be references are 
   315  either in the same namespace or are scoped to the entire cluster.
   316  
   317  ```go
   318  // AzureClusterIdentity is the Schema for the azureclustersidentities API
   319  type AzureClusterIdentity struct {
   320    metav1.TypeMeta   `json:",inline"`
   321    metav1.ObjectMeta `json:"metadata,omitempty"`
   322    
   323    Spec   AzureClusterIdentitySpec   `json:"spec,omitempty"`
   324    Status AzureClusterIdentityStatus `json:"status,omitempty"`
   325  }
   326  
   327  
   328  type AzureClusterIdentitySpec struct {
   329    // UserAssignedMSI or Service Principal
   330    Type IdentityType `json:"type"`
   331    // User assigned MSI resource id.
   332    // +optional
   333    ResourceID string `json:"resourceID,omitempty"`
   334    // Both User Assigned MSI and SP can use this field.
   335    ClientID string `json:"clientID"`
   336    // ClientSecret is a secret reference which should contain either a Service Principal password or certificate secret.
   337    // +optional
   338    ClientSecret corev1.SecretReference `json:"clientSecret,omitempty"`
   339    // Service principal primary tenant id.
   340    TenantID string `json:"tenantID"`
   341    // AllowedNamespaces is an array of namespaces that AzureClusters can
   342    // use this Identity from.
   343    //
   344    // An empty list (default) indicates that AzureClusters can use this
   345    // Identity from any namespace. This field is intentionally not a
   346    // pointer because the nil behavior (no namespaces) is undesirable here.
   347    // +optional
   348    AllowedNamespaces []string `json:"allowedNamespaces"`
   349  }
   350  
   351  
   352  type  AzureClusterSpec  struct {
   353    ...
   354    // +optional
   355    IdentityRef *corev1.ObjectReference `json:"identityRef,omitempty"`
   356  ```
   357  
   358  Example:
   359  
   360  <em>Service Principal</em>
   361  ```yaml
   362  ---
   363  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   364  kind: AzureCluster
   365  metadata:
   366    name: cluster1
   367    namespace: default
   368  spec:
   369    identityRef:
   370      apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   371      kind: AzureClusterIdentity
   372      name: sp-identity
   373      namespace: default
   374    location: westus2
   375    networkSpec:
   376      vnet:
   377        name: cluster1-vnet
   378    resourceGroup: cluster1
   379    subscriptionID: 8000873c-41a6-11eb-8907-4db4e45a2e79
   380  ---
   381  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   382  kind: AzureClusterIdentity
   383  metadata:
   384    name: sp-identity
   385    namespace: default
   386  spec:
   387    clientID: 74e870e6-cac6-11ea-87d0-0242ac130003
   388    clientSecret:
   389      name: secretName
   390      namespace: secretNamespace
   391    tenantID: 6bec3eaa-cac6-11ea-87d0-0242ac130003
   392    type: ServicePrincipal
   393  ```
   394  
   395  <em>User Assigned Identity</em>
   396  ```yaml
   397  ---
   398  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   399  kind: AzureCluster
   400  metadata:
   401    name: cluster1
   402    namespace: default
   403  spec:
   404    identityRef:
   405      apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   406      kind: AzureClusterIdentity
   407      name: sp-identity
   408      namespace: default
   409    location: westus2
   410    networkSpec:
   411      vnet:
   412        name: cluster1-vnet
   413    resourceGroup: cluster1
   414    subscriptionID: 8000873c-41a6-11eb-8907-4db4e45a2e79
   415  ---
   416  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   417  kind: AzureClusterIdentity
   418  metadata:
   419    name: sp-identity
   420    namespace: default
   421  spec:
   422    clientID: 74e870e6-cac6-11ea-87d0-0242ac130003
   423    tenantID: 6bec3eaa-cac6-11ea-87d0-0242ac130003
   424    type: UserAssignedMSI
   425  ```
   426  
   427  ### Controller Changes
   428  
   429  * If IdentityRef is specified, the CRD is fetched and used to create an `azure.Authorizer`
   430  * The controller will compare the hash of the credential provider against the same secret’s provider
   431    in a cache ([NFR2](#NFR2)).
   432  * The controller will take the newer of the two and instantiate Azure clients with the selected 
   433    `Authorizer`.
   434  * The controller will set the `AzureClusterIdentity` resource as one of the OwnerReferences of the 
   435    `AzureCluster` if the resource is in the same namespace as the `AzureCluster`.
   436  * The controller should reconcile `AzureClusterIdentity` and create corresponding `AzureIdentity`
   437    and `AzureIdentityBinding` resources within the AAD Pod Identity namespace to enable the management
   438    cluster to use the identities specified.
   439  * The controller should have a label which will be used by the `AzureIdentityBinding` to inform AAD
   440    Pod Identity.
   441  
   442  ### Clusterctl changes
   443  
   444  Today, `clusterctl move` operates by tracking objectreferences within the same namespace, since we 
   445  are now proposing to use cluster-scoped resources, we will need to add requisite support to 
   446  clusterctl's object graph to track ownerReferences pointing at cluster-scoped resources, and ensure 
   447  they are moved. We will naively not delete cluster-scoped resources during a move, as they maybe 
   448  referenced across namespaces.
   449  
   450  #### Principal Type Credential Provider Behaviour
   451  
   452  Implementations for all principal types will implement the AutoREST `Authorizer` interface as well 
   453  as an additional function signature to support caching.
   454  
   455  ```go
   456  // Authorizer is the interface that provides a PrepareDecorator used to supply request
   457  // authorization. Most often, the Authorizer decorator runs last so it has access to the full
   458  // state of the formed HTTP request.
   459  type Authorizer interface {
   460      WithAuthorization() PrepareDecorator
   461  }
   462  
   463  type AzureIdentityProvider interface {
   464      Authorizer
   465  
   466      // Hash returns a unique hash of the data forming the credentials
   467      // for this principal
   468      Hash() (string, error)
   469  }
   470  ```
   471  
   472  Azure client sessions are structs implementing the `Authorizer` interface. The Azure SDK for Golang 
   473  will verify the token in the `Authorizer` has not expired. If the token has expired, the 
   474  `Authorizer` will attempt to fetch a new token from AAD.
   475  
   476  The controller will maintain a cache of all the `Authorizers` used by the clusters. In practice,
   477  this should be a single `Authorizer` for the cluster. If we were to create clusters which span
   478  Azure AAD Tenants, then there may be a need for multiple `Authorizers` per cluster.
   479  
   480  ### Security Model
   481  
   482  The intended RBAC model mirrors that for Service APIs:
   483  
   484  #### Roles
   485  
   486  For the purposes of this security model, 3 common roles have been identified:
   487  
   488  * **Infrastructure provider**: The infrastructure provider (infra) is responsible for the overall 
   489    environment that the cluster(s) are operating in or the PaaS provider in a company.
   490  
   491  * **Management cluster operator**: The cluster operator (ops) is responsible for
   492    administration of the Cluster API management cluster. They manage policies, network access,
   493    application permissions.
   494  
   495  * **Workload cluster operator**: The workload cluster operator (dev) is responsible for
   496    management of the cluster relevant to their particular applications .
   497  
   498  There are two primary components to the Service APIs security model: RBAC and namespace restrictions.
   499  
   500  ### RBAC
   501  RBAC (role-based access control) is the standard used for Kubernetes authorization. This allows 
   502  users to configure who can perform actions on resources in specific scopes. RBAC can be used to 
   503  enable each of the roles defined above. In most cases, it will be desirable to have all resources be
   504  readable by most roles, so instead we'll focus on write access for this model.
   505  
   506  ##### Write Permissions
   507  |                              | AzureServicePrincipal, etc | Azure RBAC API | Cluster |
   508  | ---------------------------- | -------------------- | ----------- | ------- |
   509  | Infrastructure Provider      | Yes                  | Yes         | Yes     |
   510  | Management Cluster Operators | Yes                  | Yes         | Yes     |
   511  | Workload Cluster Operator    | No                   | No          | Yes     |
   512  
   513  ### Namespace Restrictions
   514  The extra configuration options are not possible to control with RBAC. Instead,
   515  they will be controlled with configuration fields on `AzureClusterIdentity`:
   516  
   517  * **allowedNamespaces**: This field is a list of namespaces that can use the 
   518    `AzureClusterIdentity`from. CAPZ will not support AzureClusters in namespaces outside this selector. 
   519    An empty selector (default) indicates that AzureCluster can use this Azure Identity from any 
   520    namespace. This field is intentionally not a pointer because the nil behavior (no namespaces) is 
   521    undesirable here.
   522  
   523  
   524  ### CAPZ Controller Requirements
   525  The CAPZ controller will need to:
   526  
   527  * Populate condition fields on AzureClusters and indicate if it is
   528    compatible with the Azure Identity. For example, if UserAssignedMSI is specified and is not 
   529    available, a condition should be set indicating failure.
   530  * Not implement invalid configuration. For example, if the AzureCluster references an `AzureClusterIdentity`
   531    in an invalid namespace for it, it should indicate it through a condition or ignore.
   532  * Respond to changes in an `AzureClusterIdentity` spec.
   533  
   534  ### Risks and Mitigation
   535  
   536  #### Caching and handling refresh of credentials
   537  
   538  For handling many accounts, the number of calls to AAD must be minimised. To minimise the number of 
   539  calls to AAD, the cache should store `Authorizers` by a key consisting of two parts, the credential
   540  type and the credential name, `key={cred-type|cred-name}, value=Authorizer`.
   541  
   542  ## Upgrade Strategy
   543  
   544  The data changes are additive and optional, so existing AzureCluster specifications will continue 
   545  to reconcile as before. These changes will only come into play when specifying Azure Identity in 
   546  the new field in AzureClusterSpec. Upgrades to versions with this new field will be broken.
   547  
   548  ## Additional Details
   549  
   550  ### Test Plan
   551  
   552  * Unit tests to validate that the cluster controller can reconcile an AzureCluster when PrincipalRef
   553    field is nil, or specified for each principal type.
   554  * Propose performing an initial Azure API call and fail pre-flight if this fails.
   555  * e2e test for each principal type.
   556  * clusterctl e2e test with a move of a self-hosted cluster using a principalRef.
   557  
   558  ### Graduation Criteria
   559  
   560  #### Alpha
   561  
   562  * Support using an Azure Service Principal using the IdentityRef
   563  * Ensure `clusterctl move` works with the mechanism.
   564  
   565  #### Beta
   566  
   567  * Support using Azure User Assigned Identities
   568  * Documentation describing all identities used in Management and Workload Clusters and their roles
   569  * Full e2e coverage for both Service Principals and User Assigned Identities
   570  * Identity caching to minimize authentication requests
   571  
   572  #### Stable
   573  
   574  * Two releases since beta.
   575  * Describe cluster identities as the preferred way to provision infrastructure as opposed to env vars
   576  
   577  ## Implementation History
   578  
   579  - 2020/07/20: Initial proposal
   580  - 2020/12/17: Initial PR merge https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/977
   581  - 2020/12/18: Proposal update to reflect the adaptations from #977
   582  
   583  <!-- Links -->