github.com/ghodss/etcd@v0.3.1-0.20140417172404-cc329bfa55cb/Documentation/discovery-protocol.md (about)

     1  # Discovery Protocol
     2  
     3  Starting a new etcd cluster can be painful since each machine needs to know of at least one live machine in the cluster. If you are trying to bring up a new cluster all at once, say using an AWS cloud formation, you also need to coordinate who will be the initial cluster leader. The discovery protocol uses an existing running etcd cluster to start a second etcd cluster.
     4  
     5  To use this feature you add the command line flag `-discovery` to your etcd args. In this example we will use `http://example.com/v2/keys/_etcd/registry` as the URL prefix.
     6  
     7  ## The Protocol
     8  
     9  By convention the etcd discovery protocol uses the key prefix `_etcd/registry`. A full URL to the keyspace will be `http://example.com/v2/keys/_etcd/registry`.
    10  
    11  ### Creating a New Cluster
    12  
    13  Generate a unique token that will identify the new cluster. This will be used as a key prefix in the following steps. An easy way to do this is to use uuidgen:
    14  
    15  ```
    16  UUID=$(uuidgen)
    17  ```
    18  
    19  ### Bringing up Machines
    20  
    21  Now that you have your cluster ID you can start bringing up machines. Every machine will follow this protocol internally in etcd if given a `-discovery`.
    22  
    23  ### Registering your Machine
    24  
    25  The first thing etcd must do is register your machine. This is done by using the machine name (from the `-name` arg) and posting it with a long TTL to the given key.
    26  
    27  ```
    28  curl -X PUT "http://example.com/v2/keys/_etcd/registry/${UUID}/${etcd_machine_name}?ttl=604800" -d value=${peer_addr}
    29  ```
    30  
    31  ### Discovering Peers
    32  
    33  Now that this etcd machine is registered it must discover its peers.
    34  
    35  But, the tricky bit of starting a new cluster is that one machine needs to assume the initial role of leader and will have no peers. To figure out if another machine has already started the cluster etcd needs to create the `_state` key and set its value to "started":
    36  
    37  ```
    38  curl -X PUT "http://example.com/v2/keys/_etcd/registry/${UUID}/_state?prevExist=false" -d value=started
    39  ```
    40  
    41  If this returns a `200 OK` response then this machine is the initial leader and should start with no peers configured. If, however, this returns a `412 Precondition Failed` then you need to find all of the registered peers:
    42  
    43  ```
    44  curl -X GET "http://example.com/v2/keys/_etcd/registry/${UUID}?recursive=true"
    45  ```
    46  
    47  ```
    48  {
    49      "action": "get",
    50      "node": {
    51          "createdIndex": 11,
    52          "dir": true,
    53          "key": "/_etcd/registry/9D4258A5-A1D3-4074-8837-31C1E091131D",
    54          "modifiedIndex": 11,
    55          "nodes": [
    56              {
    57                  "createdIndex": 16,
    58                  "expiration": "2014-02-03T13:19:57.631253589-08:00",
    59                  "key": "/_etcd/registry/9D4258A5-A1D3-4074-8837-31C1E091131D/peer1",
    60                  "modifiedIndex": 16,
    61                  "ttl": 604765,
    62                  "value": "127.0.0.1:7001"
    63              },
    64              {
    65                  "createdIndex": 17,
    66                  "expiration": "2014-02-03T13:19:57.631253589-08:00",
    67                  "key": "/_etcd/registry/9D4258A5-A1D3-4074-8837-31C1E091131D/peer2",
    68                  "modifiedIndex": 17,
    69                  "ttl": 604765,
    70                  "value": "127.0.0.1:7002"
    71              }
    72          ]
    73      }
    74  }
    75  ```
    76  
    77  Using this information you can connect to the rest of the peers in the cluster.
    78  
    79  ### Heartbeating
    80  
    81  At this point etcd will start heart beating to your registration URL. The
    82  protocol uses a heartbeat so permanently deleted nodes get slowly removed from
    83  the discovery information cluster.
    84  
    85  The heartbeat interval is about once per day and the TTL is one week. This
    86  should give a sufficiently wide window to protect against a discovery service
    87  taking a temporary outage yet provide adequate cleanup.