go.etcd.io/etcd@v3.3.27+incompatible/Documentation/op-guide/recovery.md (about)

     1  ---
     2  title: Disaster recovery
     3  ---
     5  etcd is designed to withstand machine failures. An etcd cluster automatically recovers from temporary failures (e.g., machine reboots) and tolerates up to *(N-1)/2* permanent failures for a cluster of N members. When a member permanently fails, whether due to hardware failure or disk corruption, it loses access to the cluster. If the cluster permanently loses more than *(N-1)/2* members then it disastrously fails, irrevocably losing quorum. Once quorum is lost, the cluster cannot reach consensus and therefore cannot continue accepting updates.
     7  To recover from disastrous failure, etcd v3 provides snapshot and restore facilities to recreate the cluster without v3 key data loss. To recover v2 keys, refer to the [v2 admin guide][v2_recover].
     9  [v2_recover]: ../v2/admin_guide.md#disaster-recovery
    11  ## Snapshotting the keyspace
    13  Recovering a cluster first needs a snapshot of the keyspace from an etcd member. A snapshot may either be taken from a live member with the `etcdctl snapshot save` command or by copying the `member/snap/db` file from an etcd data directory. For example, the following command snapshots the keyspace served by `$ENDPOINT` to the file `snapshot.db`:
    15  ```sh
    16  $ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db
    17  ```
    19  ## Restoring a cluster
    21  To restore a cluster, all that is needed is a single snapshot "db" file. A cluster restore with `etcdctl snapshot restore` creates new etcd data directories; all members should restore using the same snapshot. Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. This metadata overwrite prevents the new member from inadvertently joining an existing cluster. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.
    23  Snapshot integrity may be optionally verified at restore time. If the snapshot is taken with `etcdctl snapshot save`, it will have an integrity hash that is checked by `etcdctl snapshot restore`. If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using `--skip-hash-check`.
    25  A restore initializes a new member of a new cluster, with a fresh cluster configuration using `etcd`'s cluster configuration flags, but preserves the contents of the etcd keyspace. Continuing from the previous example, the following creates new etcd data directories (`m1.etcd`, `m2.etcd`, `m3.etcd`) for a three member cluster:
    27  ```sh
    28  $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
    29    --name m1 \
    30    --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
    31    --initial-cluster-token etcd-cluster-1 \
    32    --initial-advertise-peer-urls http://host1:2380
    33  $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
    34    --name m2 \
    35    --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
    36    --initial-cluster-token etcd-cluster-1 \
    37    --initial-advertise-peer-urls http://host2:2380
    38  $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
    39    --name m3 \
    40    --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
    41    --initial-cluster-token etcd-cluster-1 \
    42    --initial-advertise-peer-urls http://host3:2380
    43  ```
    45  Next, start `etcd` with the new data directories:
    47  ```sh
    48  $ etcd \
    49    --name m1 \
    50    --listen-client-urls http://host1:2379 \
    51    --advertise-client-urls http://host1:2379 \
    52    --listen-peer-urls http://host1:2380 &
    53  $ etcd \
    54    --name m2 \
    55    --listen-client-urls http://host2:2379 \
    56    --advertise-client-urls http://host2:2379 \
    57    --listen-peer-urls http://host2:2380 &
    58  $ etcd \
    59    --name m3 \
    60    --listen-client-urls http://host3:2379 \
    61    --advertise-client-urls http://host3:2379 \
    62    --listen-peer-urls http://host3:2380 &
    63  ```
    65  Now the restored etcd cluster should be available and serving the keyspace given by the snapshot.
    67  ## Restoring a cluster from membership mis-reconfiguration with wrong URLs
    69  Previously, etcd panics on [membership mis-reconfiguration with wrong URLs](https://github.com/etcd-io/etcd/issues/9173) (v3.2.15 or later returns [error early in client-side](https://github.com/etcd-io/etcd/pull/9174) before etcd server panic).
    71  Recommended way is restore from [snapshot](#snapshotting-the-keyspace). `--force-new-cluster` can be used to overwrite cluster membership while keeping existing application data, but is strongly discouraged because it will panic if other members from previous cluster are still alive. Make sure to save snapshot periodically.