github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/cluster/bootstrapping.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Creating a Nomad Cluster"
     4  sidebar_current: "docs-cluster-bootstrap"
     5  description: |-
     6    Learn how to bootstrap a Nomad cluster.
     7  ---
     8  
     9  # Creating a cluster
    10  
    11  Nomad models infrastructure as regions and datacenters. Regions may contain
    12  multiple datacenters. Servers are assigned to regions and manage all state for
    13  the region and make scheduling decisions within that region. Clients are
    14  registered to a single datacenter and region.
    15  
    16  [![Regional Architecture](/assets/images/nomad-architecture-region.png)](/assets/images/nomad-architecture-region.png)
    17  
    18  This page will explain how to bootstrap a production grade Nomad region, both
    19  with and without Consul, and how to federate multiple regions together.
    20  
    21  [![Global Architecture](/assets/images/nomad-architecture-global.png)](/assets/images/nomad-architecture-global.png)
    22  
    23  Bootstrapping Nomad is made significantly easier when there already exists a
    24  Consul cluster in place. Since Nomad's topology is slightly richer than Consul's
    25  since it supports not only datacenters but also regions lets start with how
    26  Consul should be deployed in relation to Nomad.
    27  
    28  For more details on the architecture of Nomad and how it models infrastructure
    29  see the [Architecture page](/docs/internals/architecture.html).
    30  
    31  ## Deploying Consul Clusters
    32  
    33  A Nomad cluster gains the ability to bootstrap itself as well as provide service
    34  and health check registration to applications when Consul is deployed along side
    35  Nomad.
    36  
    37  Consul models infrastructures as datacenters and multiple Consul datacenters can
    38  be connected over the WAN so that clients can discover nodes in other
    39  datacenters. Since Nomad regions can encapsulate many datacenters, we recommend
    40  running a Consul cluster in every Nomad datacenter and connecting them over the
    41  WAN. Please refer to the Consul guide for both
    42  [bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single datacenter and 
    43  [connecting multiple Consul clusters over the
    44  WAN](https://www.consul.io/docs/guides/datacenters.html).
    45  
    46  
    47  ## Bootstrapping a Nomad cluster
    48  
    49  Nomad supports merging multiple configuration files together on startup. This is
    50  done to enable generating a base configuration that can be shared by Nomad
    51  servers and clients. A suggested base configuration is:
    52  
    53  ```
    54  # Name the region, if omitted, the default "global" region will be used.
    55  region = "europe"
    56  
    57  # Persist data to a location that will survive a machine reboot.
    58  data_dir = "/opt/nomad/"
    59  
    60  # Bind to all addresses so that the Nomad agent is available both on loopback
    61  # and externally.
    62  bind_addr = "0.0.0.0"
    63  
    64  # Advertise an accessible IP address so the server is reachable by other servers
    65  # and clients. The IPs can be materialized by Terraform or be replaced by an
    66  # init script.
    67  advertise {
    68      http = "${self.ipv4_address}:4646"
    69      rpc = "${self.ipv4_address}:4647"
    70      serf = "${self.ipv4_address}:4648"
    71  }
    72  
    73  # Ship metrics to monitor the health of the cluster and to see task resource
    74  # usage.
    75  telemetry {
    76      statsite_address = "${var.statsite}"
    77      disable_hostname = true
    78  }
    79  
    80  # Enable debug endpoints.
    81  enable_debug = true
    82  ```
    83  
    84  ### With Consul
    85  
    86  If a local Consul cluster is bootstrapped before Nomad, on startup Nomad
    87  server's will register with Consul and discover other server's. With their set
    88  of peers, they will automatically form quorum, respecting the `bootstrap_expect`
    89  field. Thus to form a 3 server region, the below configuration can be used in
    90  conjunction with the base config:
    91  
    92  ```
    93  server {
    94      enabled = true
    95      bootstrap_expect = 3
    96  }
    97  ```
    98  
    99  And an equally simple configuration can be used for clients:
   100  
   101  ```
   102  # Replace with the relevant datacenter.
   103  datacenter = "dc1"
   104  
   105  client {
   106      enabled = true
   107  }
   108  ```
   109  
   110  As you can see, the above configurations have no mention of the other server's to
   111  join or any Consul configuration. That is because by default, the following is
   112  merged with the configuration file:
   113  
   114  ```
   115  consul {
   116      # The address to the Consul agent.
   117      address = "127.0.0.1:8500"
   118  
   119      # The service name to register the server and client with Consul.
   120      server_service_name = "nomad"
   121      client_service_name = "nomad-client"
   122  
   123      # Enables automatically registering the services.
   124      auto_advertise = true
   125  
   126      # Enabling the server and client to bootstrap using Consul.
   127      server_auto_join = true
   128      client_auto_join = true
   129  }
   130  ```
   131  
   132  Since the `consul` block is merged by default, bootstrapping a cluster becomes
   133  as easy as running the following on each of the three servers:
   134  
   135  ```
   136  $ nomad agent -config base.hcl -config server.hcl
   137  ```
   138  
   139  And on every client in the cluster, the following should be run:
   140  
   141  ```
   142  $ nomad agent -config base.hcl -config client.hcl
   143  ```
   144  
   145  With the above configurations and commands the Nomad agents will automatically
   146  register themselves with Consul and discover other Nomad servers. If the agent
   147  is a server, it will join the quorum and if it is a client, it will register
   148  itself and join the cluster.
   149  
   150  Please refer to the [Consul documentation](/docs/agent/config.html#consul_options)
   151  for the complete set of configuration options.
   152  
   153  ### Without Consul
   154  
   155  When bootstrapping without Consul, Nomad servers and clients must be started
   156  knowing the address of at least one Nomad server.
   157  
   158  To join the Nomad server's we can either encode the address in the server
   159  configs as such:
   160  
   161  ```
   162  server {
   163      enabled = true
   164      bootstrap_expect = 3
   165      retry_join = ["<known-address>"]
   166  }
   167  ```
   168  
   169  Alternatively, the address can be supplied after the servers have all been started by
   170  running the [`server-join` command](/docs/commands/server-join.html) on the servers
   171  individual to cluster the servers. All servers can join just one other server,
   172  and then rely on the gossip protocol to discover the rest.
   173  
   174  ```
   175  nomad server-join <known-address>
   176  ```
   177  
   178  On the client side, the addresses of the servers are expected to be specified
   179  via the client configuration.
   180  
   181  ```
   182  client {
   183      enabled = true
   184      servers = ["10.10.11.2:4647", "10.10.11.3:4647", "10.10.11.4:4647"]
   185  }
   186  ```
   187  
   188  If servers are added or removed from the cluster, the information will be
   189  pushed to the client. This means, that only one server must be specified because
   190  after initial contact, the full set of servers in the client's region will be
   191  pushed to the client.
   192  
   193  The port corresponds to the RPC port. If no port is specified with the IP address,
   194  the default RCP port of `4647` is assumed.
   195  
   196  The same commmands can be used to start the servers and clients as shown in the
   197  bootstrapping with Consul section.
   198  
   199  ### Federating a cluster
   200  
   201  Nomad clusters across multiple regions can be federated allowing users to submit
   202  jobs or interact with the HTTP API targeting any region, from any server.
   203  
   204  Federating multiple Nomad clusters is as simple as joining servers. From any
   205  server in one region, simply issue a join command to a server in the remote
   206  region:
   207  
   208  ```
   209  nomad server-join 10.10.11.8:4648
   210  ```
   211  
   212  Servers across regions discover other servers in the cluster via the gossip
   213  protocol and hence it's enough to join one known server.
   214  
   215  If the Consul clusters in the different Nomad regions are federated, and Consul
   216  `server_auto_join` is enabled, then federation occurs automatically.
   217  
   218  ## Network Topology
   219  
   220  ### Nomad Servers
   221  
   222  Nomad servers are expected to have sub 10 millisecond network latencies between
   223  each other to ensure liveness and high throughput scheduling. Nomad servers
   224  can be spread across multiple datacenters if they have low latency
   225  connections between them to achieve high availability.
   226  
   227  For example, on AWS every region comprises of multiple zones which have very low
   228  latency links between them, so every zone can be modeled as a Nomad datacenter
   229  and every Zone can have a single Nomad server which could be connected to form a
   230  quorum and a region. 
   231  
   232  Nomad servers uses Raft for state replication and Raft being highly consistent
   233  needs a quorum of servers to function, therefore we recommend running an odd
   234  number of Nomad servers in a region.  Usually running 3-5 servers in a region is
   235  recommended. The cluster can withstand a failure of one server in a cluster of
   236  three servers and two failures in a cluster of five servers. Adding more servers
   237  to the quorum adds more time to replicate state and hence throughput decreases
   238  so we don't recommend having more than seven servers in a region.
   239  
   240  ### Nomad Clients
   241  
   242  Nomad clients do not have the same latency requirements as servers since they
   243  are not participating in Raft. Thus clients can have 100+ millisecond latency to
   244  their servers. This allows having a set of Nomad servers that service clients
   245  that can be spread geographically over a continent or even the world in the case
   246  of having a single "global" region and many datacenter.
   247  
   248  ## Production Considerations
   249  
   250  ### Nomad Servers
   251  
   252  Depending on the number of jobs the cluster will be managing and the rate at
   253  which jobs are submitted, the Nomad servers may need to be run on large machine
   254  instances. We suggest having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and
   255  significant network bandwith. The core count and network recommendations are to
   256  ensure high throughput as Nomad heavily relies on network communication and as
   257  the Servers are managing all the nodes in the region and performing scheduling.
   258  The memory and disk requirements are due to the fact that Nomad stores all state
   259  in memory and will store two snapshots of this data onto disk. Thus disk should
   260  be at least 2 times the memory available to the server when deploying a high
   261  load cluster.
   262  
   263  ### Nomad Clients
   264  
   265  Nomad clients support reserving resources on the node that should not be used by
   266  Nomad. This should be used to target a specific resource utilization per node
   267  and to reserve resources for applications running outside of Nomad's supervision
   268  such as Consul and the operating system itself.
   269  
   270  Please see the [`reservation` config](/docs/agent/config.html#reserved) for more detail.