github.com/openshift/installer@v1.4.17/docs/user/troubleshootingbootstrap.md (about) 1 # Troubleshooting Bootstrap Failures 2 3 Unfortunately, there will always be some cases where OpenShift fails to install properly. In these events, it is helpful to understand the likely failure modes as well as how to troubleshoot the failure. 4 5 ## Gathering bootstrap failure logs 6 7 ### Using the installer provisioned workflow 8 9 When users are using the installer to create the OpenShift cluster, the installer has all the information to automatically capture the logs from bootstrap host in case of failure. 10 11 #### Authenticating to bootstrap host for ipi 12 13 The installer will use the user's environment to discover the credentials to connect to the bootstrap host over SSH. One of the following methods is used by the installer, 14 15 1. Use the user's already setup `SSH_AGENT`. If the user has a ssh-agent setup, the installer will use it for SSH authentication. 16 17 2. Use the user's home directory, `~/.ssh` on Linux hosts, to load all the SSH private keys and use those for SSH authentication. 18 a. The installer also configures the bootstrap host with a *generated* SSH key, and this private key will be used for SSH authentication if none of the user keys are trusted. 19 The installer only configures the bootstrap host to trust the generated key, and therefore the log bundle will only contain the logs from the bootstrap host and not the control-plane hosts. 20 21 ### Using the user provisioned workflow 22 23 When users are creating the infrastructure for the OpenShift cluster and the cluster fails to bootstrap, the users can use the `gather bootstrap` subcommand to gather the logs from the bootstrap host. 24 25 ```console 26 $ openshift-install gather bootstrap --help 27 Gather debugging data for a failing-to-bootstrap control plane 28 29 Usage: 30 openshift-install gather bootstrap [flags] 31 32 Flags: 33 --bootstrap string Hostname or IP of the bootstrap host 34 -h, --help help for bootstrap 35 --key stringArray Path to SSH private keys that should be used for authentication. If no key was provided, SSH private keys from user's environment will be used 36 --master stringArray Hostnames or IPs of all control plane hosts 37 ``` 38 39 An example of a invocation for a cluster with three control-plane machines would be, 40 41 ```sh 42 openshift-install gather bootstrap --bootstrap ${BOOTSTRAP_HOST_IP} --master ${CONTROL_PLANE_1_HOST_IP} --master ${CONTROL_PLANE_2_HOST_IP} --master ${CONTROL_PLANE_3_HOST_IP} 43 ``` 44 45 #### Authenticating to bootstrap host for upi 46 47 When explicitly using the `gather bootstrap` subcommand, user can either utilize the installer's discovery mechanism like detailed [above](#authenticating-with bootstrap host-for-ipi) or provide the keys using the `--key` flag. 48 49 An example of a invocation for a cluster with three control-plane machines would be, 50 51 ```sh 52 openshift-install gather bootstrap --key ${KEY_1} --key ${KEY_2} --bootstrap ${BOOTSTRAP_HOST_IP} --master ${CONTROL_PLANE_1_HOST_IP} --master ${CONTROL_PLANE_2_HOST_IP} --master ${CONTROL_PLANE_3_HOST_IP} 53 ``` 54 55 ## Understanding the bootstrap failure log bundle 56 57 Here's what a log bundle looks like, 58 59 ```console 60 . 61 ├── bootstrap 62 ├── control-plane 63 ├── failed-units.txt 64 ├── rendered-assets 65 ├── resources 66 └── unit-status 67 68 5 directories, 1 file 69 ``` 70 71 ### file: failed-units.txt 72 73 The failed-units.txt contains a list of all the **failed** systemd units on the bootstrap host. 74 75 ### directory: unit-status 76 77 The unit-status directory contains the details of each failed systemd unit from [failed-units](#file-failed-units-txt), 78 79 ### directory: bootstrap 80 81 The bootstrap directory consists of all the important logs and files from the bootstrap host. There are three subdirectories for the bootstrap host 82 83 ```console 84 bootstrap 85 ├── containers 86 ├── journals 87 └── pods 88 89 3 directories, 0 files 90 ``` 91 92 #### directory: bootstrap/containers 93 94 The containers directory contains the descriptions and logs from all the containers created by the kubelet using CRI-O for the static pods. 95 This directory contains all the operators or their operands running on the bootstrap host in special bootstrap modes. For example the machine-config-server container, or the bootstrap-kube-controlplane pods etc. 96 97 For each container the directory has two files, 98 99 * `<human readable id>.log`, which contains the log of the container. 100 * `<human readable id>.inspect`, which contains the information about the container like the image, volume mounts, arguments etc. 101 102 #### directory: bootstrap/journals 103 104 The journals directory contains the logs for *important* systemd units. These units are, 105 106 * `release-image.log`, the release-image unit is responsible for pulling the Release Image to the bootstrap host. 107 * `crio-configure.log` and `crio.log`, these units are responsible for configuring the CRI-O on the bootstrap host and CRI-O daemon respectively. 108 * `kubelet.log`, the kubelet service is responsible for running the kubelet on the bootstrap host. The kubelet on the bootstrap host is responsible for running the static pods for etcd, bootstrap-kube-controlplane and various other operators in bootstrap mode. 109 * `approve-csr.log`, the approve-csr unit is responsible for allowing control-plane machines to join OpenShift cluster. This unit performs the job of in-cluster approver while the bootstrapping is in progress. 110 * `bootkube.log`, the bootkube service is the unit that performs the bootstrapping of OpenShift clusters using all the operators. This service is responsible for running all the required steps to bootstrap the API and then wait for success. 111 112 There might also be other services that are important for some platforms like OpenStack, that will have logs in this directory. 113 114 #### directory: bootstrap/pods 115 116 The pods directory contains the information and logs from all the render commands for various operators run by the bootkube unit. 117 118 For each container the directory has two files, 119 120 * `<human readable id>.log`, which contains the log of the container. 121 * `<human readable id>.inspect`, which contains the information about the container like the image, volume mounts, arguments etc. 122 123 ### directory: resources 124 125 The resources directory contains various Kubernetes objects that are present in the cluster. These resources are pulled using the bootstrap API running on the bootstrap host. 126 127 ### directory: rendered-assets 128 129 The rendered-assets directory contains all the files and directories created by the bootkube unit using various render command for operators. This directory is a snapshot of the `/opt/openshift` directory on the bootstrap-host. 130 131 ### directory: control-plane 132 133 The control-plane directory contains logs for each control-plane host. It contains a sub directory for each control-plane host, usually the IP address of the hosts. 134 135 ```console 136 control-plane 137 ├── 10.0.128.114 138 │ ├── containers 139 │ ├── failed-units.txt 140 │ ├── journals 141 │ └── unit-status 142 ├── 10.0.142.138 143 │ ├── containers 144 │ ├── failed-units.txt 145 │ ├── journals 146 │ └── unit-status 147 └── 10.0.148.48 148 ├── containers 149 ├── failed-units.txt 150 ├── journals 151 └── unit-status 152 153 12 directories, 3 files 154 ``` 155 156 #### directory: control-plane/name/containers 157 158 The containers directory contains the descriptions and logs from all the containers created by the kubelet using CRI-O on the control-plane host. The files are same as [containers directory](#directory-bootstrap-containers) on bootstrap host. 159 160 #### directory: control-plane/name/journals 161 162 The journals directory contains the logs of **important** units on the control plane hosts. The list of such units is, 163 164 * `crio.log` 165 * `kubelet.log` 166 * `machine-config-daemon-host.log` and `pivot.log`, these files have logs for RHCOS pivot related actions on the control plane host. 167 168 ## Common Failures 169 170 Here are some common failures that the users can troubleshoot using the bootstrap failure log bundle. 171 172 ### Unable to pull the bootstrap failure logs 173 174 1. `Attempted to gather debug logs after installation failure: failed to create SSH client: failed to initialize the SSH agent: no keys found for SSH agent` 175 The installer tried to create a new SSH agent, but there were no keys found in user's home directory, usually `~/.ssh` on Linux. The user can use the `--key` flag to provide the private key for SSH to gather the bootstrap failure logs. 176 177 2. `failed to create SSH client: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain` 178 The keys provided to the installer from the `SSH_AGENT` or the keys loaded from user's home directory do not have permission to SSH to the bootstrap host. The user can use the `--key` flag to provide the private key for SSH to gather the bootstrap failure logs. 179 180 ### Unable to pull Release Image 181 182 When the pull secret provided to the installer does not have correct permissions to pull the Release Image, the `bootstrap/journals/release-image.log` should contain the debugging logs. 183 184 For example, 185 186 ```txt 187 -- Logs begin at Fri 2020-04-24 17:08:15 UTC, end at Fri 2020-04-24 17:33:16 UTC. -- 188 Apr 24 17:08:46 ci-op-2cbvx-bootstrap.c.openshift-gce-devel-ci.internal systemd[1]: Starting Download the OpenShift Release Image... 189 Apr 24 17:08:46 ci-op-2cbvx-bootstrap.c.openshift-gce-devel-ci.internal release-image-download.sh[1688]: Pulling registry.svc.ci.openshift.org/ci-op-8dv01g3m/release@sha256:50b07a8b4529d8fd2ac6c23ecc311034a3b86cada41c948baaced8c6a46077bc... 190 Apr 24 17:08:49 ci-op-2cbvx-bootstrap.c.openshift-gce-devel-ci.internal podman[1698]: 2020-04-24 17:08:49.307961668 +0000 UTC m=+1.119158273 system refresh 191 Apr 24 17:08:49 ci-op-2cbvx-bootstrap.c.openshift-gce-devel-ci.internal release-image-download.sh[1688]: Error: error pulling image "registry.svc.ci.openshift.org/ci-op-8dv01g3m/release@sha256:50b07a8b4529d8fd2ac6c23ecc311034a3b86cada41c948baaced8c6a46077bc": unable to pull registry.svc.ci.openshift.org/ci-op-8dv01g3m/release@sha256:50b07a8b4529d8fd2ac6c23ecc311034a3b86cada41c948baaced8c6a46077bc: unable to pull image: Error initializing source docker://registry.svc.ci.openshift.org/ci-op-8dv01g3m/release@sha256:50b07a8b4529d8fd2ac6c23ecc311034a3b86cada41c948baaced8c6a46077bc: Error reading manifest sha256:50b07a8b4529d8fd2ac6c23ecc311034a3b86cada41c948baaced8c6a46077bc in registry.svc.ci.openshift.org/ci-op-8dv01g3m/release: unauthorized: authentication required 192 ``` 193 194 ### Bootkube logs are empty 195 196 For cases where the bootkube logs are empty in `bootstrap/journals/bootkube.log` like, 197 198 ```txt 199 -- Logs begin at Fri 2020-04-24 17:08:15 UTC, end at Fri 2020-04-24 17:33:16 UTC. -- 200 -- No entries -- 201 ``` 202 203 There is high likelihood that the Release Image cannot be downloaded and more details can be found using [release-image.log](#unable-to-pull-release-image) 204 205 ## Control-plane logs missing from log bundle 206 207 When the control-plane logs are missing from the log bundle, for example, 208 209 ```console 210 $ tree control-plane -L 2 211 control-plane 212 ├── 10.0.0.4 213 ├── 10.0.0.5 214 └── 10.0.0.6 215 216 3 directories, 0 files 217 ``` 218 219 The troubleshooting would require the logs of the installer gathering the log bundle, which are easily available in `.openshift_install.log`.