istio.io/istio@v0.0.0-20240520182934-d79c90f27776/cni/README.md (about)

     1  # Istio CNI Node Agent
     2  
     3  The Istio CNI Node Agent is responsible for several things
     4  
     5  - Install an Istio CNI plugin binary on each node's filesystem, updating that node's CNI config in e.g (`/etc/cni/net.d`), and watching the config and binary paths to reinstall if things are modified.
     6  - In sidecar mode, the CNI plugin can configure sidecar networking for pods when they are scheduled by the container runtime, using iptables. The CNI handling the netns setup replaces the current Istio approach using a `NET_ADMIN` privileged `initContainers` container, `istio-init`, injected in the pods along with `istio-proxy` sidecars. This removes the need for a privileged, `NET_ADMIN` container in the Istio users' application pods.
     7  - In ambient mode, the CNI plugin does not configure any networking, but is only responsible for synchronously pushing new pod events back up to an ambient watch server which runs as part of the Istio CNI node agent. The ambient server will find the pod netns and configure networking inside that pod via iptables. The ambient server will additionally watch enabled namespaces, and enroll already-started-but-newly-enrolled pods in a similar fashion.
     8  
     9  ## Privileges required
    10  
    11  Regardless of mode, the Istio CNI Node Agent requires privileged node permissions, and will require allow-listing in constrained environments that block privileged workloads by default. If using sidecar repair mode or ambient mode, the node agent additionally needs permissions to enter pod network namespaces and perform networking configuration in them. If either sidecar repair or ambient mode are enabled, on startup the container will drop all Linux capabilities via (`drop:ALL`), and re-add back the ones sidecar repair/ambient explicitly require to function, namely:
    12  
    13  - CAP_SYS_ADMIN
    14  - CAP_NET_ADMIN
    15  - CAP_NET_RAW
    16  
    17  ## Ambient mode details
    18  
    19  Fundamentally, this component is responsible for the following:
    20  
    21  - Sets up redirection with newly-started (or newly-added, previously-started) application pods such that traffic from application pods is forwarded to the local node's ztunnel pod.
    22  - Configures required iptables, sockets, and packet routing miscellanea within the `ztunnel` and application pod network namespaces to make that happen.
    23  
    24  This component accomplishes that in the following ways:
    25  
    26  1. By installing a separate, very basic "CNI plugin" binary onto the node to forward low-level pod lifecycle events (CmdAdd/CmdDel/etc) from whatever node-level CNI subsystem is in use to this node agent for processing via socket.
    27  1. By running as a node-level daemonset that:
    28  
    29  - listens for these UDS events from the CNI plugin (which fire when new pods are spawned in an ambient-enabled namespace), and adds those pods to the ambient mesh.
    30  - watches k8s resource for existing pods, so that pods that have already been started can be moved in or out of the ambient mesh.
    31  - sends UDS events to ztunnel via a socket whenever a pod is enabled for ambient mesh (whether from CNI plugin or node watcher), instructing ztunnel to create the "tube" socket.
    32  
    33  The ambient CNI agent is the only place where ambient network config and pod redirection machinery happens.
    34  In ambient mode, the CNI plugin is effectively just a shim to catch pod creation events and notify the CNI agent early enough to set up network redirection before the pod is fully started. This is necessary because the CNI plugin is effectively the first thing to see a scheduled pod - before the K8S control plane will see things like the pod IP or networking info, the CNI will - but the CNI plugin alone is not sufficient to handle all pod events (already-started pod updates, rebuilding current state on CNI restart) that the node agent cares about.
    35  
    36  ## Reference
    37  
    38  ### Design details
    39  
    40  Broadly, `istio-cni` accomplishes ambient redirection by instructing ztunnel to set up sockets within the application pod network namespace, where:
    41  
    42  - one end of the socket is in the application pod
    43  - and the other end is in ztunnel's pod
    44  
    45  and setting up iptables rules to funnel traffic thru that socket "tube" to ztunnel and back.
    46  
    47  This effectively behaves like ztunnel is an in-pod sidecar, without actually requiring the injection of ztunnel as a sidecar into the pod manifest, or mutatating the application pod in any way.
    48  
    49  Additionally, it does not require any network rules/routing/config in the host network namespace, which greatly increases ambient mode compatibility with 3rd-party CNIs. In virtually all cases, this "in-pod" ambient CNI is exactly as compatible with 3rd-party CNIs as sidecars are/were.
    50  
    51  ### Notable Env Vars
    52  
    53  | Env Var            | Default         | Purpose                                                                                                                                       |
    54  |--------------------|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
    55  | HOST_PROBE_SNAT_IP | "169.254.7.127" | Applied to SNAT host probe packets, so they can be identified/skipped podside. Any link-local address in the 169.254.0.0/16 block can be used |
    56  | HOST_PROBE_SNAT_IPV6 | "fd16:9254:7127:1337:ffff:ffff:ffff:ffff" | IPv6 link local ranges are designed to be collision-resistant by default, and so this probably never needs to be overridden |
    57  
    58  ## Sidecar Mode Implementation Details
    59  
    60  Istio CNI injection is currently based on the same Pod annotations used in init-container/inject mode.
    61  
    62  ### Selection API
    63  
    64  - plugin config "exclude namespaces" applies first
    65  - ambient is enabled if:
    66      - namespace label "istio.io/dataplane-mode" == "ambient", and/or pod label "istio.io/dataplane-mode" == "ambient"
    67      - "sidecar.istio.io/status" annotation is not present on the pod (created by injection of sidecar)
    68      - pod label "istio.io/dataplane-mode" is not "none"
    69  - sidecar interception is enabled if:
    70      - "istio-init" container is not present in the pod.
    71      - istio-proxy container exists and
    72          - does not have DISABLE_ENVOY environment variable (which triggers proxyless mode)
    73          - has a istio-proxy container, with first 2 args "proxy" and "sidecar" - or less then 2 args, or first arg not proxy.
    74          - "sidecar.istio.io/inject" is not false
    75          - "sidecar.istio.io/status" exists
    76  
    77  ### Redirect API
    78  
    79  The annotation based control is currently only supported in 'sidecar' mode. See plugin/redirect.go for details.
    80  
    81  - redirectMode allows TPROXY may to be set, required envoy has extra permissions. Default is redirect.
    82  - includeIPCidr, excludeIPCidr
    83  - includeInboudPorts, excludeInboundPorts
    84  - includeOutboutPorts, excludeOutboundPorts
    85  - excludeInterfaces
    86  - kubevirtInterfaces
    87  - ISTIO_META_DNS_CAPTURE env variable on the proxy - enables dns redirect
    88  - INVALID_DROP env var on proxy - changes behavior from reset to drop in iptables
    89  - auto excluded inbound ports: 15020, 15021, 15090
    90  
    91  The code automatically detects the proxyUID and proxyGID from RunAsUser/RunAsGroup and exclude them from interception, defaulting to 1337
    92  
    93  ### Overview
    94  
    95  - [istio-cni Helm chart](../manifests/charts/istio-cni/templates)
    96      - `install-cni` daemonset - main function is to install and help the node CNI, but it is also a proper server and interacts with K8S, watching Pods for recovery.
    97      - `istio-cni-config` configmap with CNI plugin config to add to CNI plugin chained config
    98      - creates service-account `istio-cni` with `ClusterRoleBinding` to allow gets on pods' info and delete/modifications for recovery.
    99  
   100  - `install-cni` container
   101      - copies `istio-cni` and `istio-iptables` to `/opt/cni/bin`
   102      - creates kubeconfig for the service account the pod runs under
   103      - periodically copy the K8S JWT token for istio-cni on the host to connect to K8S.
   104      - injects the CNI plugin config to the CNI config file
   105          - CNI installer will try to look for the config file under the mounted CNI net dir based on file name extensions (`.conf`, `.conflist`)
   106          - the file name can be explicitly set by `CNI_CONF_NAME` env var
   107          - the program inserts `CNI_NETWORK_CONFIG` into the `plugins` list in `/etc/cni/net.d/${CNI_CONF_NAME}`
   108      - the actual code is in pkg/install - including a readiness probe, monitoring.
   109      - it also sets up a UDS socket for istio-cni to send logs to this container.
   110      - based on config, it may run the 'repair' controller that detects pods where istio setup fails and restarts them, or created in corner cases.
   111      - if ambient is enabled, also runs an ambient controller, watching Pod, Namespace
   112  
   113  - `istio-cni`
   114      - CNI plugin executable copied to `/opt/cni/bin`
   115      - currently implemented for k8s only
   116      - on pod add, determines whether pod should have netns setup to redirect to Istio proxy. See [cmdAdd](#cmdadd-workflow) for detailed logic.
   117          - it connects to K8S using the kubeconfig and JWT token copied from install-cni to get Pod and Namespace. Since this is a short-running command, each invocation creates a new connection.
   118          - If so, calls `istio-iptables` with params to setup pod netns
   119          - If ambient, sets up the ambient logic.
   120  
   121  - `istio-iptables`
   122      - sets up iptables to redirect a list of ports to the port envoy will listen
   123      - shared code with istio-init container
   124      - it will generate an iptables-save config, based on annotations/labels and other settings, and apply it.
   125  
   126  ### CmdAdd Sidecar Workflow
   127  
   128  `CmdAdd` is triggered when there is a new pod created. This runs on the node, in a chain of CNI plugins - Istio is
   129  run after the main CNI sets up the pod IP and networking.
   130  
   131  1. Check k8s pod namespace against exclusion list (plugin config)
   132      - Config must exclude namespace that Istio control-plane is installed in (TODO: this may change, exclude at pod level is sufficient and we may want Istiod and other istio components to use ambient too)
   133      - If excluded, ignore the pod and return prevResult
   134  1. Setup redirect rules for the pods:
   135      - Get the port list from pods definition, as well as annotations.
   136      - Setup iptables with required port list: `nsenter --net=<k8s pod netns> /opt/cni/bin/istio-iptables ...`. Following conditions will prevent the redirect rules to be setup in the pods:
   137          - Pods have annotation `sidecar.istio.io/inject` set to `false` or has no key `sidecar.istio.io/status` in annotations
   138          - Pod has `istio-init` initContainer - this indicates a pod running its own injection setup.
   139  1. Return prevResult
   140  
   141  ## Troubleshooting
   142  
   143  ### Collecting Logs
   144  
   145  #### Using `istioctl`/helm
   146  
   147  - Set: `values.global.logging.level="cni:debug,ambient:debug"`
   148  - Inspect the pod logs of a `istio-cni` Daemonset pod on a specific node.
   149  
   150  #### From a specific node syslog
   151  
   152  The CNI plugins are executed by threads in the `kubelet` process.  The CNI plugins logs end up the syslog
   153  under the `kubelet` process. On systems with `journalctl` the following is an example command line
   154  to view the last 1000 `kubelet` logs via the `less` utility to allow for `vi`-style searching:
   155  
   156  ```console
   157  $ journalctl -t kubelet -n 1000 | less
   158  ```
   159  
   160  #### GKE via Stackdriver Log Viewer
   161  
   162  Each GKE cluster's will have many categories of logs collected by Stackdriver.  Logs can be monitored via
   163  the project's [log viewer](https://cloud.google.com/logging/docs/view/overview) and/or the `gcloud logging read`
   164  capability.
   165  
   166  The following example grabs the last 10 `kubelet` logs containing the string "cmdAdd" in the log message.
   167  
   168  ```console
   169  $ gcloud logging read "resource.type=k8s_node AND jsonPayload.SYSLOG_IDENTIFIER=kubelet AND jsonPayload.MESSAGE:cmdAdd" --limit 10 --format json
   170  ```
   171  
   172  ## Other Reference
   173  
   174  The framework for this implementation of the CNI plugin is based on the
   175  [containernetworking sample plugin](https://github.com/containernetworking/plugins/tree/main/plugins/sample)
   176  
   177  The details for the deployment & installation of this plugin were pretty much lifted directly from the
   178  [Calico CNI plugin](https://github.com/projectcalico/cni-plugin).
   179  
   180  Specifically:
   181  
   182  - The CNI installation script is containerized and deployed as a daemonset in k8s.  The relevant calico k8s manifests were used as the model for the istio-cni plugin's manifest:
   183      - [daemonset and configmap](https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/calico.yaml) - search for the `calico-node` Daemonset and its `install-cni` container deployment
   184      - [RBAC](https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/rbac.yaml) - this creates the service account the CNI plugin is configured to use to access the kube-api-server