github.com/Azure/aad-pod-identity@v1.8.17/website/content/en/docs/Troubleshooting/_index.md (about)

     1  ---
     2  title: "Troubleshooting"
     3  linkTitle: "Troubleshooting"
     4  weight: 7
     5  date: 2020-10-04
     6  description: >
     7    An overview of a list of components to assist in troubleshooting.
     8  ---
     9  
    10  ## Logging
    11  
    12  Below is a list of commands you can use to view relevant logs of aad-pod-identity components.
    13  
    14  ### Isolate errors from logs
    15  
    16  You can use `grep ^E` and `--since` flag from `kubectl` to isolate any errors occurred after a given duration.
    17  
    18  ```bash
    19  kubectl logs -l component=mic --since=1h | grep ^E
    20  kubectl logs -l component=nmi --since=1h | grep ^E
    21  ```
    22  
    23  > It is always a good idea to include relevant logs from MIC and NMI when opening a new [issue](https://github.com/Azure/aad-pod-identity/issues).
    24  
    25  ### Ensure that iptables rule exists
    26  
    27  To ensure that the correct iptables rule is injected to each node via the [NMI](../concepts/nmi) pods, the following command ensures that on a given node, there exists an iptables rule where all packets with a destination IP of 169.254.169.254 (IMDS endpoint) are routed to port 2579 of the host network.
    28  
    29  ```bash
    30  NMI_POD=$(kubectl get pod -l component=nmi -ojsonpath='{.items[?(@.spec.nodeName=="<NodeName>")].metadata.name}')
    31  kubectl exec $NMI_POD -- iptables -t nat -S aad-metadata
    32  ```
    33  
    34  The expected output should be:
    35  
    36  ```log
    37  -N aad-metadata
    38  -A aad-metadata ! -s 127.0.0.1/32 -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 10.240.0.34:2579
    39  -A aad-metadata -j RETURN
    40  ```
    41  
    42  ### Run a pod to validate your identity setup
    43  
    44  You could run the following commands to validate your identity setup (assuming you have the proper `AzureIdentity` and `AzureIdentityBinding` deployed):
    45  
    46  ```bash
    47  kubectl run azure-cli -it --image=mcr.microsoft.com/azure-cli --labels=aadpodidbinding=<selector defined in AzureIdentityBinding> /bin/bash
    48  
    49  # within the azure-cli shell
    50  az login --identity --allow-no-subscriptions --debug
    51  ```
    52  
    53  `az login --identity` will use the Azure identity bound to the `azure-cli` pod and perform a login to Azure via Azure CLI. If succeeded, you would have an output as below:
    54  
    55  ```log
    56  urllib3.connectionpool : Starting new HTTP connection (1): 169.254.169.254:80
    57  urllib3.connectionpool : http://169.254.169.254:80 "GET /metadata/identity/oauth2/token?resource=https%3A%2F%2Fmanagement.core.windows.net%2F&api-version=2018-02-01 HTTP/1.1" 200 1667
    58  msrestazure.azure_active_directory : MSI: Retrieving a token from http://169.254.169.254/metadata/identity/oauth2/token, with payload {'resource': 'https://management.core.windows.net/', 'api-version': '2018-02-01'}
    59  msrestazure.azure_active_directory : MSI: Token retrieved
    60  ...
    61  [
    62    {
    63      "environmentName": "AzureCloud",
    64      "homeTenantId": "<REDACTED>",
    65      "id": "<REDACTED>",
    66      "isDefault": true,
    67      "managedByTenants": [],
    68      "name": "<REDACTED>",
    69      "state": "Enabled",
    70      "tenantId": "<REDACTED>",
    71      "user": {
    72        "assignedIdentityInfo": "MSI",
    73        "name": "systemAssignedIdentity",
    74        "type": "servicePrincipal"
    75      }
    76    }
    77  ]
    78  ```
    79  
    80  Based on the logs above, Azure CLI was able to retrieve a token from `http://169.254.169.254:80/metadata/identity/oauth2/token`. Its request is routed to the NMI pod that is running within the same node. Identify which node the Azure CLI pod is scheduled to by running the following command:
    81  
    82  ```bash
    83  kubectl get pods -owide
    84  
    85  NAME                                    READY   STATUS    RESTARTS   AGE   IP             NODE                                 NOMINATED NODE   READINESS GATES
    86  azure-cli                               1/1     Running   1          12s   10.240.0.117   k8s-agentpool1-95854893-vmss000002   <none>           <none>
    87  ```
    88  
    89  Take a note at the node the pod is scheduled to and its IP address. Check the logs of the NMI pod that is scheduled to the same node. You should be able to see a token requested by the azure-cli pod, identified by its pod IP address `10.240.0.117`:
    90  
    91  ```bash
    92  kubectl logs <nmi pod name>
    93  
    94  ...
    95  I0821 18:22:50.810806       1 standard.go:72] no clientID or resourceID in request. default/azure-cli has been matched with azure identity default/demo
    96  I0821 18:22:50.810895       1 standard.go:178] matched identityType:0 clientid:7eb6##### REDACTED #####a6a9 resource:https://management.core.windows.net/
    97  I0821 18:22:51.348117       1 server.go:190] status (200) took 537597287 ns for req.method=GET reg.path=/metadata/identity/oauth2/token req.remote=10.240.0.117
    98  ...
    99  ```
   100  
   101  ## Common Issues
   102  
   103  Common issues or questions that users have run into when using pod identity are detailed below.
   104  
   105  ### Ignoring azure identity \<podns\>/\<podname\>, error: Invalid resource id: "", must match /subscriptions/\<subid\>/resourcegroups/\<resourcegroup\>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/\<name\>
   106  
   107  If you are using MIC v1.6.0+, you will need to ensure the correct capitalization of `AzureIdentity` and `AzureIdentityBinding` fields. For more information, please refer to [this section](../#v160-breaking-change).
   108  
   109  ### LinkedAuthorizationFailed
   110  
   111  If you received the following error message in MIC:
   112  
   113  ```log
   114  Code="LinkedAuthorizationFailed" Message="The client '<ClientID>' with object id '<ObjectID>' has permission to perform action 'Microsoft.Compute/<VMType>/write' on scope '<VM/VMSS scope>'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '<UserAssignedIdentityScope>' or the linked scope(s) are invalid."
   115  ```
   116  
   117  It means that your cluster service principal / managed identity does not have the correct role assignment to assign the chosen user-assigned identities to the VM/VMSS. For more information, please follow this [documentation](../getting-started/role-assignment/) to allow your cluster service principal / managed identity to perform identity-related operation.
   118  
   119  Past issues:
   120  
   121  - https://github.com/Azure/aad-pod-identity/issues/585
   122  
   123  ### Unable to remove `AzureAssignedIdentity` after MIC pods are deleted
   124  
   125  With release `1.6.1`, finalizers have been added to `AzureAssignedIdentity` to ensure the identities are successfully cleaned up by MIC before they're deleted. However, in scenarios where the MIC deployment is force deleted before it has completed the clean up of identities from the underlying node, the `AzureAssignedIdentity` will be left behind as it contains a finalizer.
   126  
   127  To delete all `AzureAssignedIdentity`, run the following command:
   128  ```bash
   129  kubectl get azureassignedidentity -A -o=json | jq '.items[].metadata.finalizers=null' | kubectl apply -f -
   130  kubectl delete azureassignedidentity --all
   131  ```
   132  
   133  To delete only a specific `AzureAssignedIdentity`, run the following command:
   134  ```bash
   135  kubectl get azureassignedidentity <name> -n <namespace> -o=json | jq '.items[].metadata.finalizers=null' | kubectl apply -f -
   136  kubectl delete azureassignedidentity <name> -n <namespace>
   137  ```
   138  
   139  Past issues:
   140  - https://github.com/Azure/aad-pod-identity/issues/644
   141  
   142  ### Token requests calls fail with i/o timeout
   143  
   144  If you received the following or similar error in your application:
   145  
   146  ```log
   147  azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/subId/resourceGroups/rg/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": dial tcp 169.254.169.254:80: i/o timeout'
   148  ```
   149  
   150  It means there is a network policy blocking egress traffic to `169.254.169.254` from the host. NMI pods run on `hostNetwork` and listen on `127.0.0.1:2579`. Please ensure there is a network policy that allows traffic to `127.0.0.1:2579`. Example `GlobalNetworPolicy` configuration for Calico:
   151  
   152  ```yaml
   153  kind: GlobalNetworkPolicy
   154  apiVersion: crd.projectcalico.org/v1
   155  metadata:
   156    name: egress-localhost
   157  spec:
   158    types:
   159      - Egress
   160    egress:
   161      - action: Allow
   162        protocol: TCP
   163        destination:
   164          nets:
   165            - 127.0.0.1
   166          port: [2579]
   167  ```
   168  
   169  Past issues:
   170  - https://github.com/Azure/aad-pod-identity/issues/716
   171  - https://github.com/Azure/aad-pod-identity/issues/821
   172  
   173  ### Spark jobs failed to acquire tokens
   174  
   175  Spark jobs that use AAD Pod Identity as a way to acquire tokens should add the following configurations (assuming `AzureIdentity` and `AzureIdentityBinding` are deployed beforehand):
   176  
   177  ```bash
   178  ...
   179  --conf spark.kubernetes.driver.label.aadpodidbinding=<AzureIdentityBinding selector> \
   180  --conf spark.kubernetes.executor.label.aadpodidbinding=<AzureIdentityBinding selector> \
   181  ...
   182  ```
   183  
   184  Past issues:
   185  - https://github.com/Azure/aad-pod-identity/issues/947