github.com/Azure/aad-pod-identity@v1.8.17/website/content/en/docs/Troubleshooting/_index.md (about) 1 --- 2 title: "Troubleshooting" 3 linkTitle: "Troubleshooting" 4 weight: 7 5 date: 2020-10-04 6 description: > 7 An overview of a list of components to assist in troubleshooting. 8 --- 9 10 ## Logging 11 12 Below is a list of commands you can use to view relevant logs of aad-pod-identity components. 13 14 ### Isolate errors from logs 15 16 You can use `grep ^E` and `--since` flag from `kubectl` to isolate any errors occurred after a given duration. 17 18 ```bash 19 kubectl logs -l component=mic --since=1h | grep ^E 20 kubectl logs -l component=nmi --since=1h | grep ^E 21 ``` 22 23 > It is always a good idea to include relevant logs from MIC and NMI when opening a new [issue](https://github.com/Azure/aad-pod-identity/issues). 24 25 ### Ensure that iptables rule exists 26 27 To ensure that the correct iptables rule is injected to each node via the [NMI](../concepts/nmi) pods, the following command ensures that on a given node, there exists an iptables rule where all packets with a destination IP of 169.254.169.254 (IMDS endpoint) are routed to port 2579 of the host network. 28 29 ```bash 30 NMI_POD=$(kubectl get pod -l component=nmi -ojsonpath='{.items[?(@.spec.nodeName=="<NodeName>")].metadata.name}') 31 kubectl exec $NMI_POD -- iptables -t nat -S aad-metadata 32 ``` 33 34 The expected output should be: 35 36 ```log 37 -N aad-metadata 38 -A aad-metadata ! -s 127.0.0.1/32 -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 10.240.0.34:2579 39 -A aad-metadata -j RETURN 40 ``` 41 42 ### Run a pod to validate your identity setup 43 44 You could run the following commands to validate your identity setup (assuming you have the proper `AzureIdentity` and `AzureIdentityBinding` deployed): 45 46 ```bash 47 kubectl run azure-cli -it --image=mcr.microsoft.com/azure-cli --labels=aadpodidbinding=<selector defined in AzureIdentityBinding> /bin/bash 48 49 # within the azure-cli shell 50 az login --identity --allow-no-subscriptions --debug 51 ``` 52 53 `az login --identity` will use the Azure identity bound to the `azure-cli` pod and perform a login to Azure via Azure CLI. If succeeded, you would have an output as below: 54 55 ```log 56 urllib3.connectionpool : Starting new HTTP connection (1): 169.254.169.254:80 57 urllib3.connectionpool : http://169.254.169.254:80 "GET /metadata/identity/oauth2/token?resource=https%3A%2F%2Fmanagement.core.windows.net%2F&api-version=2018-02-01 HTTP/1.1" 200 1667 58 msrestazure.azure_active_directory : MSI: Retrieving a token from http://169.254.169.254/metadata/identity/oauth2/token, with payload {'resource': 'https://management.core.windows.net/', 'api-version': '2018-02-01'} 59 msrestazure.azure_active_directory : MSI: Token retrieved 60 ... 61 [ 62 { 63 "environmentName": "AzureCloud", 64 "homeTenantId": "<REDACTED>", 65 "id": "<REDACTED>", 66 "isDefault": true, 67 "managedByTenants": [], 68 "name": "<REDACTED>", 69 "state": "Enabled", 70 "tenantId": "<REDACTED>", 71 "user": { 72 "assignedIdentityInfo": "MSI", 73 "name": "systemAssignedIdentity", 74 "type": "servicePrincipal" 75 } 76 } 77 ] 78 ``` 79 80 Based on the logs above, Azure CLI was able to retrieve a token from `http://169.254.169.254:80/metadata/identity/oauth2/token`. Its request is routed to the NMI pod that is running within the same node. Identify which node the Azure CLI pod is scheduled to by running the following command: 81 82 ```bash 83 kubectl get pods -owide 84 85 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 86 azure-cli 1/1 Running 1 12s 10.240.0.117 k8s-agentpool1-95854893-vmss000002 <none> <none> 87 ``` 88 89 Take a note at the node the pod is scheduled to and its IP address. Check the logs of the NMI pod that is scheduled to the same node. You should be able to see a token requested by the azure-cli pod, identified by its pod IP address `10.240.0.117`: 90 91 ```bash 92 kubectl logs <nmi pod name> 93 94 ... 95 I0821 18:22:50.810806 1 standard.go:72] no clientID or resourceID in request. default/azure-cli has been matched with azure identity default/demo 96 I0821 18:22:50.810895 1 standard.go:178] matched identityType:0 clientid:7eb6##### REDACTED #####a6a9 resource:https://management.core.windows.net/ 97 I0821 18:22:51.348117 1 server.go:190] status (200) took 537597287 ns for req.method=GET reg.path=/metadata/identity/oauth2/token req.remote=10.240.0.117 98 ... 99 ``` 100 101 ## Common Issues 102 103 Common issues or questions that users have run into when using pod identity are detailed below. 104 105 ### Ignoring azure identity \<podns\>/\<podname\>, error: Invalid resource id: "", must match /subscriptions/\<subid\>/resourcegroups/\<resourcegroup\>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/\<name\> 106 107 If you are using MIC v1.6.0+, you will need to ensure the correct capitalization of `AzureIdentity` and `AzureIdentityBinding` fields. For more information, please refer to [this section](../#v160-breaking-change). 108 109 ### LinkedAuthorizationFailed 110 111 If you received the following error message in MIC: 112 113 ```log 114 Code="LinkedAuthorizationFailed" Message="The client '<ClientID>' with object id '<ObjectID>' has permission to perform action 'Microsoft.Compute/<VMType>/write' on scope '<VM/VMSS scope>'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '<UserAssignedIdentityScope>' or the linked scope(s) are invalid." 115 ``` 116 117 It means that your cluster service principal / managed identity does not have the correct role assignment to assign the chosen user-assigned identities to the VM/VMSS. For more information, please follow this [documentation](../getting-started/role-assignment/) to allow your cluster service principal / managed identity to perform identity-related operation. 118 119 Past issues: 120 121 - https://github.com/Azure/aad-pod-identity/issues/585 122 123 ### Unable to remove `AzureAssignedIdentity` after MIC pods are deleted 124 125 With release `1.6.1`, finalizers have been added to `AzureAssignedIdentity` to ensure the identities are successfully cleaned up by MIC before they're deleted. However, in scenarios where the MIC deployment is force deleted before it has completed the clean up of identities from the underlying node, the `AzureAssignedIdentity` will be left behind as it contains a finalizer. 126 127 To delete all `AzureAssignedIdentity`, run the following command: 128 ```bash 129 kubectl get azureassignedidentity -A -o=json | jq '.items[].metadata.finalizers=null' | kubectl apply -f - 130 kubectl delete azureassignedidentity --all 131 ``` 132 133 To delete only a specific `AzureAssignedIdentity`, run the following command: 134 ```bash 135 kubectl get azureassignedidentity <name> -n <namespace> -o=json | jq '.items[].metadata.finalizers=null' | kubectl apply -f - 136 kubectl delete azureassignedidentity <name> -n <namespace> 137 ``` 138 139 Past issues: 140 - https://github.com/Azure/aad-pod-identity/issues/644 141 142 ### Token requests calls fail with i/o timeout 143 144 If you received the following or similar error in your application: 145 146 ```log 147 azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/subId/resourceGroups/rg/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": dial tcp 169.254.169.254:80: i/o timeout' 148 ``` 149 150 It means there is a network policy blocking egress traffic to `169.254.169.254` from the host. NMI pods run on `hostNetwork` and listen on `127.0.0.1:2579`. Please ensure there is a network policy that allows traffic to `127.0.0.1:2579`. Example `GlobalNetworPolicy` configuration for Calico: 151 152 ```yaml 153 kind: GlobalNetworkPolicy 154 apiVersion: crd.projectcalico.org/v1 155 metadata: 156 name: egress-localhost 157 spec: 158 types: 159 - Egress 160 egress: 161 - action: Allow 162 protocol: TCP 163 destination: 164 nets: 165 - 127.0.0.1 166 port: [2579] 167 ``` 168 169 Past issues: 170 - https://github.com/Azure/aad-pod-identity/issues/716 171 - https://github.com/Azure/aad-pod-identity/issues/821 172 173 ### Spark jobs failed to acquire tokens 174 175 Spark jobs that use AAD Pod Identity as a way to acquire tokens should add the following configurations (assuming `AzureIdentity` and `AzureIdentityBinding` are deployed beforehand): 176 177 ```bash 178 ... 179 --conf spark.kubernetes.driver.label.aadpodidbinding=<AzureIdentityBinding selector> \ 180 --conf spark.kubernetes.executor.label.aadpodidbinding=<AzureIdentityBinding selector> \ 181 ... 182 ``` 183 184 Past issues: 185 - https://github.com/Azure/aad-pod-identity/issues/947