github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/cluster/bootstrapping.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Creating a Nomad Cluster" 4 sidebar_current: "docs-cluster-bootstrap" 5 description: |- 6 Learn how to bootstrap a Nomad cluster. 7 --- 8 9 # Creating a cluster 10 11 Nomad models infrastructure as regions and datacenters. Regions may contain 12 multiple datacenters. Servers are assigned to regions and manage all state for 13 the region and make scheduling decisions within that region. Clients are 14 registered to a single datacenter and region. 15 16 [](/assets/images/nomad-architecture-region.png) 17 18 This page will explain how to bootstrap a production grade Nomad region, both 19 with and without Consul, and how to federate multiple regions together. 20 21 [](/assets/images/nomad-architecture-global.png) 22 23 Bootstrapping Nomad is made significantly easier when there already exists a 24 Consul cluster in place. Since Nomad's topology is slightly richer than Consul's 25 since it supports not only datacenters but also regions lets start with how 26 Consul should be deployed in relation to Nomad. 27 28 For more details on the architecture of Nomad and how it models infrastructure 29 see the [Architecture page](/docs/internals/architecture.html). 30 31 ## Deploying Consul Clusters 32 33 A Nomad cluster gains the ability to bootstrap itself as well as provide service 34 and health check registration to applications when Consul is deployed along side 35 Nomad. 36 37 Consul models infrastructures as datacenters and multiple Consul datacenters can 38 be connected over the WAN so that clients can discover nodes in other 39 datacenters. Since Nomad regions can encapsulate many datacenters, we recommend 40 running a Consul cluster in every Nomad datacenter and connecting them over the 41 WAN. Please refer to the Consul guide for both 42 [bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single datacenter and 43 [connecting multiple Consul clusters over the 44 WAN](https://www.consul.io/docs/guides/datacenters.html). 45 46 47 ## Bootstrapping a Nomad cluster 48 49 Nomad supports merging multiple configuration files together on startup. This is 50 done to enable generating a base configuration that can be shared by Nomad 51 servers and clients. A suggested base configuration is: 52 53 ``` 54 # Name the region, if omitted, the default "global" region will be used. 55 region = "europe" 56 57 # Persist data to a location that will survive a machine reboot. 58 data_dir = "/opt/nomad/" 59 60 # Bind to all addresses so that the Nomad agent is available both on loopback 61 # and externally. 62 bind_addr = "0.0.0.0" 63 64 # Advertise an accessible IP address so the server is reachable by other servers 65 # and clients. The IPs can be materialized by Terraform or be replaced by an 66 # init script. 67 advertise { 68 http = "${self.ipv4_address}:4646" 69 rpc = "${self.ipv4_address}:4647" 70 serf = "${self.ipv4_address}:4648" 71 } 72 73 # Ship metrics to monitor the health of the cluster and to see task resource 74 # usage. 75 telemetry { 76 statsite_address = "${var.statsite}" 77 disable_hostname = true 78 } 79 80 # Enable debug endpoints. 81 enable_debug = true 82 ``` 83 84 ### With Consul 85 86 If a local Consul cluster is bootstrapped before Nomad, on startup Nomad 87 server's will register with Consul and discover other server's. With their set 88 of peers, they will automatically form quorum, respecting the `bootstrap_expect` 89 field. Thus to form a 3 server region, the below configuration can be used in 90 conjunction with the base config: 91 92 ``` 93 server { 94 enabled = true 95 bootstrap_expect = 3 96 } 97 ``` 98 99 And an equally simple configuration can be used for clients: 100 101 ``` 102 # Replace with the relevant datacenter. 103 datacenter = "dc1" 104 105 client { 106 enabled = true 107 } 108 ``` 109 110 As you can see, the above configurations have no mention of the other server's to 111 join or any Consul configuration. That is because by default, the following is 112 merged with the configuration file: 113 114 ``` 115 consul { 116 # The address to the Consul agent. 117 address = "127.0.0.1:8500" 118 119 # The service name to register the server and client with Consul. 120 server_service_name = "nomad" 121 client_service_name = "nomad-client" 122 123 # Enables automatically registering the services. 124 auto_advertise = true 125 126 # Enabling the server and client to bootstrap using Consul. 127 server_auto_join = true 128 client_auto_join = true 129 } 130 ``` 131 132 Since the `consul` block is merged by default, bootstrapping a cluster becomes 133 as easy as running the following on each of the three servers: 134 135 ``` 136 $ nomad agent -config base.hcl -config server.hcl 137 ``` 138 139 And on every client in the cluster, the following should be run: 140 141 ``` 142 $ nomad agent -config base.hcl -config client.hcl 143 ``` 144 145 With the above configurations and commands the Nomad agents will automatically 146 register themselves with Consul and discover other Nomad servers. If the agent 147 is a server, it will join the quorum and if it is a client, it will register 148 itself and join the cluster. 149 150 Please refer to the [Consul documentation](/docs/agent/config.html#consul_options) 151 for the complete set of configuration options. 152 153 ### Without Consul 154 155 When bootstrapping without Consul, Nomad servers and clients must be started 156 knowing the address of at least one Nomad server. 157 158 To join the Nomad server's we can either encode the address in the server 159 configs as such: 160 161 ``` 162 server { 163 enabled = true 164 bootstrap_expect = 3 165 retry_join = ["<known-address>"] 166 } 167 ``` 168 169 Alternatively, the address can be supplied after the servers have all been started by 170 running the [`server-join` command](/docs/commands/server-join.html) on the servers 171 individual to cluster the servers. All servers can join just one other server, 172 and then rely on the gossip protocol to discover the rest. 173 174 ``` 175 nomad server-join <known-address> 176 ``` 177 178 On the client side, the addresses of the servers are expected to be specified 179 via the client configuration. 180 181 ``` 182 client { 183 enabled = true 184 servers = ["10.10.11.2:4647", "10.10.11.3:4647", "10.10.11.4:4647"] 185 } 186 ``` 187 188 If servers are added or removed from the cluster, the information will be 189 pushed to the client. This means, that only one server must be specified because 190 after initial contact, the full set of servers in the client's region will be 191 pushed to the client. 192 193 The port corresponds to the RPC port. If no port is specified with the IP address, 194 the default RCP port of `4647` is assumed. 195 196 The same commmands can be used to start the servers and clients as shown in the 197 bootstrapping with Consul section. 198 199 ### Federating a cluster 200 201 Nomad clusters across multiple regions can be federated allowing users to submit 202 jobs or interact with the HTTP API targeting any region, from any server. 203 204 Federating multiple Nomad clusters is as simple as joining servers. From any 205 server in one region, simply issue a join command to a server in the remote 206 region: 207 208 ``` 209 nomad server-join 10.10.11.8:4648 210 ``` 211 212 Servers across regions discover other servers in the cluster via the gossip 213 protocol and hence it's enough to join one known server. 214 215 If the Consul clusters in the different Nomad regions are federated, and Consul 216 `server_auto_join` is enabled, then federation occurs automatically. 217 218 ## Network Topology 219 220 ### Nomad Servers 221 222 Nomad servers are expected to have sub 10 millisecond network latencies between 223 each other to ensure liveness and high throughput scheduling. Nomad servers 224 can be spread across multiple datacenters if they have low latency 225 connections between them to achieve high availability. 226 227 For example, on AWS every region comprises of multiple zones which have very low 228 latency links between them, so every zone can be modeled as a Nomad datacenter 229 and every Zone can have a single Nomad server which could be connected to form a 230 quorum and a region. 231 232 Nomad servers uses Raft for state replication and Raft being highly consistent 233 needs a quorum of servers to function, therefore we recommend running an odd 234 number of Nomad servers in a region. Usually running 3-5 servers in a region is 235 recommended. The cluster can withstand a failure of one server in a cluster of 236 three servers and two failures in a cluster of five servers. Adding more servers 237 to the quorum adds more time to replicate state and hence throughput decreases 238 so we don't recommend having more than seven servers in a region. 239 240 ### Nomad Clients 241 242 Nomad clients do not have the same latency requirements as servers since they 243 are not participating in Raft. Thus clients can have 100+ millisecond latency to 244 their servers. This allows having a set of Nomad servers that service clients 245 that can be spread geographically over a continent or even the world in the case 246 of having a single "global" region and many datacenter. 247 248 ## Production Considerations 249 250 ### Nomad Servers 251 252 Depending on the number of jobs the cluster will be managing and the rate at 253 which jobs are submitted, the Nomad servers may need to be run on large machine 254 instances. We suggest having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and 255 significant network bandwith. The core count and network recommendations are to 256 ensure high throughput as Nomad heavily relies on network communication and as 257 the Servers are managing all the nodes in the region and performing scheduling. 258 The memory and disk requirements are due to the fact that Nomad stores all state 259 in memory and will store two snapshots of this data onto disk. Thus disk should 260 be at least 2 times the memory available to the server when deploying a high 261 load cluster. 262 263 ### Nomad Clients 264 265 Nomad clients support reserving resources on the node that should not be used by 266 Nomad. This should be used to target a specific resource utilization per node 267 and to reserve resources for applications running outside of Nomad's supervision 268 such as Consul and the operating system itself. 269 270 Please see the [`reservation` config](/docs/agent/config.html#reserved) for more detail.