github.com/openshift/installer@v1.4.17/docs/user/openstack/troubleshooting.md

github.com/openshift/installer@v1.4.17/docs/user/openstack/troubleshooting.md (about)

     1  # OpenShift 4 installer on OpenStack troubleshooting
     2  
     3  Unfortunately, there will always be some cases where OpenShift fails to install properly. In these events, it is helpful to understand the likely failure modes as well as how to troubleshoot the failure.
     4  
     5  This document discusses some troubleshooting options for OpenStack based
     6  deployments. For general tips on troubleshooting the installer, see the [Installer Troubleshooting](../troubleshooting.md) guide.
     7  
     8  ## View instances logs
     9  
    10  OpenStack CLI tools should be installed, then:
    11  
    12  `openstack console log show <instance>`
    13  
    14  ## Machine has ERROR state
    15  
    16  This could be because the machine's instance was accidentally destroyed and the cluster API provider cannot recreate it.
    17  
    18  You can check the status of machines with the help of the command
    19  
    20  ```sh
    21  oc get machines -n openshift-machine-api
    22  ```
    23  
    24  If the broken machine is a master then follow the instructions in the [disaster recovery documentation](https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-1-infra-recovery.html).
    25  
    26  For workers, you should delete the machine manually with
    27  
    28  ```sh
    29  oc delete machine -n openshift-machine-api <machine_name>
    30  ```
    31  
    32  The operation can take up to 5 minutes, during which time the machine will be gracefully removed and all its resources returned to the pool.
    33  
    34  A new worker machine for the cluster will soon be created automatically by the [machine-api-operator](https://github.com/openshift/machine-api-operator).
    35  
    36  > **Note**
    37  > In future versions of OpenShift all broken machines will be automatically deleted and recovered by the machine-api-operator.
    38  
    39  ## SSH access to the instances
    40  
    41  Get the IP address of the node on the private network:
    42  
    43  ```sh
    44  openstack server list | grep master
    45  | 0dcd756b-ad80-42f1-987a-1451b1ae95ba | cluster-wbzrr-master-1     | ACTIVE    | cluster-wbzrr-openshift=172.24.0.21                | rhcos           | m1.s2.xlarge |
    46  | 3b455e43-729b-4e64-b3bd-1d4da9996f27 | cluster-wbzrr-master-2     | ACTIVE    | cluster-wbzrr-openshift=172.24.0.18                | rhcos           | m1.s2.xlarge |
    47  | 775898c3-ecc2-41a4-b98b-a4cd5ae56fd0 | cluster-wbzrr-master-0     | ACTIVE    | cluster-wbzrr-openshift=172.24.0.12                | rhcos           | m1.s2.xlarge |
    48  ```
    49  
    50  And connect to it using the master currently holding the API VIP (and hence the API FIP) as a jumpbox:
    51  
    52  ```sh
    53  ssh -J core@${FIP} core@<host>
    54  ```
    55  
    56  ## Cluster destruction if its metadata has been lost
    57  
    58  When deploying a cluster, the installer generates metadata in the asset directory that is then used to destroy the cluster. If the metadata were accidentally deleted, the destruction of the cluster terminates with an error
    59  
    60  ```txt
    61  FATAL Failed while preparing to destroy cluster: open clustername/metadata.json: no such file or directory
    62  ```
    63  
    64  To avoid this error and successfully destroy the cluster, you need to restore the `metadata.json` file in a temporary asset directory. To do this, you only need to know ID of the cluster you want to destroy.
    65  
    66  First, you need to create a temporary directory where the `metadata.json` file will be located. The name and location can be anything, but to avoid possible conflicts, we recommend using `mktemp` command.
    67  
    68  ```sh
    69  export TMP_DIR=$(mktemp -d -t shiftstack-XXXXXXXXXX)
    70  ```
    71  
    72  The next step is to restore the `metadata.json` file.
    73  
    74  ```sh
    75  export CLUSTER_ID=clustername-eiu38 # id of the cluster you want to destroy
    76  echo "{\"infraID\":\"$INFRA_ID\",\"openstack\":{\"identifier\":{\"openshiftClusterID\":\"$INFRA_ID\"}}}" > $TMP_DIR/metadata.json
    77  ```
    78  
    79  Now you have a working directory and you can destroy the cluster by executing the following command:
    80  
    81  ```sh
    82  openshift-install destroy cluster --dir $TMP_DIR
    83  ```