sigs.k8s.io/cluster-api-provider-azure@v1.14.3/docs/book/src/topics/troubleshooting.md (about) 1 # Troubleshooting Guide 2 3 Common issues users might run into when using Cluster API Provider for Azure. This list is work-in-progress. Feel free to open a PR to add to it if you find that useful information is missing. 4 5 ## Examples of troubleshooting real-world issues 6 7 ### No Azure resources are getting created 8 9 This is likely due to missing or invalid Azure credentials. 10 11 Check the CAPZ controller logs on the management cluster: 12 13 ```bash 14 kubectl logs deploy/capz-controller-manager -n capz-system manager 15 ``` 16 17 If you see an error similar to this: 18 19 ``` 20 azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/123/providers/Microsoft.Compute/skus?%24filter=location+eq+%27eastus2%27&api-version=2019-04-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {\"error\":\"invalid_client\",\"error_description\":\"AADSTS7000215: Invalid client secret is provided. 21 ``` 22 23 Make sure the provided Service Principal client ID and client secret are correct and that the password has not expired. 24 25 ### The AzureCluster infrastructure is provisioned but no virtual machines are coming up 26 27 Your Azure subscription might have no quota for the requested VM size in the specified Azure location. 28 29 Check the CAPZ controller logs on the management cluster: 30 31 ```bash 32 kubectl logs deploy/capz-controller-manager -n capz-system manager 33 ``` 34 35 If you see an error similar to this: 36 ``` 37 "error"="failed to reconcile AzureMachine: failed to create virtual machine: failed to create VM capz-md-0-qkg6m in resource group capz-fkl3tp: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"OperationNotAllowed\" Message=\"Operation could not be completed as it results in exceeding approved standardDSv3Family Cores quota. 38 ``` 39 40 Follow the [these steps](https://learn.microsoft.com/azure/azure-resource-manager/templates/error-resource-quota). Alternatively, you can specify another Azure location and/or VM size during cluster creation. 41 42 ### A virtual machine is running but the k8s node did not join the cluster 43 44 Check the AzureMachine (or AzureMachinePool if using a MachinePool) status: 45 ```bash 46 kubectl get azuremachines -o wide 47 ``` 48 49 If you see an output like this: 50 51 ``` 52 NAME READY STATE 53 default-template-md-0-w78jt false Updating 54 ``` 55 56 This indicates that the bootstrap script has not yet succeeded. Check the AzureMachine `status.conditions` field for more information. 57 58 [Take a look at the cloud-init logs](#checking-cloud-init-logs-ubuntu) for further debugging. 59 60 ### One or more control plane replicas are missing 61 62 Take a look at the KubeadmControlPlane controller logs and look for any potential errors: 63 64 ```bash 65 kubectl logs deploy/capi-kubeadm-control-plane-controller-manager -n capi-kubeadm-control-plane-system manager 66 ``` 67 68 In addition, make sure all pods on the workload cluster are healthy, including pods in the `kube-system` namespace. 69 70 ### Nodes are in NotReady state 71 72 Make sure you have installed a CNI on the workload cluster and that all the pods on the workload cluster are in running state. 73 74 ### Load Balancer service fails to come up 75 76 Check the cloud-controller-manager logs on the workload cluster. 77 78 If running the Azure cloud provider in-tree: 79 80 ``` 81 kubectl logs kube-controller-manager-<control-plane-node-name> -n kube-system 82 ``` 83 84 If running the Azure cloud provider out-of-tree: 85 86 ``` 87 kubectl logs cloud-controller-manager -n kube-system 88 ``` 89 90 91 ## Watching Kubernetes resources 92 93 To watch progression of all Cluster API resources on the management cluster you can run: 94 95 ```bash 96 kubectl get cluster-api 97 ``` 98 99 ## Looking at controller logs 100 101 To check the CAPZ controller logs on the management cluster, run: 102 103 ```bash 104 kubectl logs deploy/capz-controller-manager -n capz-system manager 105 ``` 106 107 ### Checking cloud-init logs (Ubuntu) 108 109 Cloud-init logs can provide more information on any issues that happened when running the bootstrap script. 110 111 #### Option 1: Using the Azure Portal 112 113 Located in the virtual machine blade (if [enabled](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/boot-diagnostics) for the VM), the boot diagnostics option is under the Support and Troubleshooting section in the Azure portal. 114 115 For more information, see [here](https://learn.microsoft.com/azure/virtual-machines/boot-diagnostics#boot-diagnostics-view) 116 117 #### Option 2: Using the Azure CLI 118 119 ```bash 120 az vm boot-diagnostics get-boot-log --name MyVirtualMachine --resource-group MyResourceGroup 121 ``` 122 123 For more information, see [here](https://learn.microsoft.com/cli/azure/vm/boot-diagnostics?view=azure-cli-latest). 124 125 #### Option 3: With SSH 126 127 Using the ssh information provided during cluster creation (environment variable `AZURE_SSH_PUBLIC_KEY_B64`): 128 129 130 ##### connect to first control node - capi is default linux user created by deployment 131 ``` 132 API_SERVER=$(kubectl get azurecluster capz-cluster -o jsonpath='{.spec.controlPlaneEndpoint.host}') 133 ssh capi@${API_SERVER} 134 ``` 135 136 ##### list nodes 137 ``` 138 kubectl get azuremachines 139 NAME READY STATE 140 capz-cluster-control-plane-2jprg true Succeeded 141 capz-cluster-control-plane-ck5wv true Succeeded 142 capz-cluster-control-plane-w4tv6 true Succeeded 143 capz-cluster-md-0-s52wb false Failed 144 capz-cluster-md-0-w8xxw true Succeeded 145 ``` 146 147 ##### pick node name from output above: 148 ``` 149 node=$(kubectl get azuremachine capz-cluster-md-0-s52wb -o jsonpath='{.status.addresses[0].address}') 150 ssh -J capi@${apiserver} capi@${node} 151 ``` 152 153 ##### look at cloud-init logs 154 `less /var/log/cloud-init-output.log` 155 156 ## Automated log collection 157 158 As part of CI there is a [log collection tool](https://github.com/kubernetes-sigs/cluster-api-provider-azure/tree/main/test/logger.go) <!-- markdown-link-check-disable-line --> 159 which you can also leverage to pull all the logs for machines which will dump logs to `${PWD}/_artifacts}` by default. The following works 160 if your kubeconfig is configured with the management cluster. See the tool for more settings. 161 162 ```bash 163 go run -tags e2e ./test/logger.go --name <workload-cluster-name> --namespace <workload-cluster-namespace> 164 ``` 165 166 There are also some [provided scripts](https://github.com/kubernetes-sigs/cluster-api-provider-azure/tree/main/hack/debugging) that can help automate a few common tasks.