github.com/dkerwin/nomad@v0.3.3-0.20160525181927-74554135514b/website/source/intro/getting-started/running.html.md (about) 1 --- 2 layout: "intro" 3 page_title: "Running Nomad" 4 sidebar_current: "getting-started-running" 5 description: |- 6 Learn about the Nomad agent, and the lifecycle of running and stopping. 7 --- 8 9 # Running Nomad 10 11 Nomad relies on a long running agent on every machine in the cluster. 12 The agent can run either in server or client mode. Each region must 13 have at least one server, though a cluster of 3 or 5 servers is recommended. 14 A single server deployment is _**highly**_ discouraged as data loss is inevitable 15 in a failure scenario. 16 17 All other agents run in client mode. A client is a very lightweight 18 process that registers the host machine, performs heartbeating, and runs any tasks 19 that are assigned to it by the servers. The agent must be run on every node that 20 is part of the cluster so that the servers can assign work to those machines. 21 22 ## Starting the Agent 23 24 For simplicity, we will run a single Nomad agent in development mode. This mode 25 is used to quickly start an agent that is acting as a client and server to test 26 job configurations or prototype interactions. It should _**not**_ be used in 27 production as it does not persist state. 28 29 ``` 30 vagrant@nomad:~$ sudo nomad agent -dev 31 32 ==> Starting Nomad agent... 33 ==> Nomad agent configuration: 34 35 Atlas: <disabled> 36 Client: true 37 Log Level: DEBUG 38 Region: global (DC: dc1) 39 Server: true 40 41 ==> Nomad agent started! Log data will stream in below: 42 43 [INFO] serf: EventMemberJoin: nomad.global 127.0.0.1 44 [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core] 45 [INFO] client: using alloc directory /tmp/NomadClient599911093 46 [INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state 47 [INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1) 48 [WARN] fingerprint.network: Ethtool not found, checking /sys/net speed file 49 [WARN] raft: Heartbeat timeout reached, starting election 50 [INFO] raft: Node at 127.0.0.1:4647 [Candidate] entering Candidate state 51 [DEBUG] raft: Votes needed: 1 52 [DEBUG] raft: Vote granted. Tally: 1 53 [INFO] raft: Election won. Tally: 1 54 [INFO] raft: Node at 127.0.0.1:4647 [Leader] entering Leader state 55 [INFO] raft: Disabling EnableSingleNode (bootstrap) 56 [DEBUG] raft: Node 127.0.0.1:4647 updated peer set (2): [127.0.0.1:4647] 57 [INFO] nomad: cluster leadership acquired 58 [DEBUG] client: applied fingerprints [arch cpu host memory storage network] 59 [DEBUG] client: available drivers [docker exec java] 60 [DEBUG] client: node registration complete 61 [DEBUG] client: updated allocations at index 1 (0 allocs) 62 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0) 63 [DEBUG] client: state updated to ready 64 ``` 65 66 As you can see, the Nomad agent has started and has output some log 67 data. From the log data, you can see that our agent is running in both 68 client and server mode, and has claimed leadership of the cluster. 69 Additionally, the local client has been registered and marked as ready. 70 71 -> **Note:** Typically any agent running in client mode must be run with root level 72 privilege. Nomad makes use of operating system primitives for resource isolation 73 which require elevated permissions. The agent will function as non-root, but 74 certain task drivers will not be available. 75 76 ## Cluster Nodes 77 78 If you run [`nomad node-status`](/docs/commands/node-status.html) in another terminal, you 79 can see the registered nodes of the Nomad cluster: 80 81 ```text 82 $ vagrant ssh 83 ... 84 85 $ nomad node-status 86 ID Datacenter Name Class Drain Status 87 171a583b dc1 nomad <none> false ready 88 ``` 89 90 The output shows our Node ID, which is a randomly generated UUID, 91 its datacenter, node name, node class, drain mode and current status. 92 We can see that our node is in the ready state, and task draining is 93 currently off. 94 95 The agent is also running in server mode, which means it is part of 96 the [gossip protocol](/docs/internals/gossip.html) used to connect all 97 the server instances together. We can view the members of the gossip 98 ring using the [`server-members`](/docs/commands/server-members.html) command: 99 100 ```text 101 $ nomad server-members 102 Name Address Port Status Protocol Build Datacenter Region 103 nomad.global 127.0.0.1 4648 alive 2 0.3.0dev dc1 global 104 ``` 105 106 The output shows our own agent, the address it is running on, its 107 health state, some version information, and the datacenter and region. 108 Additional metadata can be viewed by providing the `-detailed` flag. 109 110 ## <a name="stopping"></a>Stopping the Agent 111 112 You can use `Ctrl-C` (the interrupt signal) to halt the agent. 113 By default, all signals will cause the agent to forcefully shutdown. 114 The agent [can be configured](/docs/agent/config.html) to gracefully 115 leave on either the interrupt or terminate signals. 116 117 After interrupting the agent, you should see it leave the cluster 118 and shut down: 119 120 ``` 121 ^C==> Caught signal: interrupt 122 [DEBUG] http: Shutting down http server 123 [INFO] agent: requesting shutdown 124 [INFO] client: shutting down 125 [INFO] nomad: shutting down server 126 [WARN] serf: Shutdown without a Leave 127 [INFO] agent: shutdown complete 128 ``` 129 130 By gracefully leaving, Nomad clients update their status to prevent 131 further tasks from being scheduled and to start migrating any tasks that are 132 already assigned. Nomad servers notify their peers they intend to leave. 133 When a server leaves, replication to that server stops. If a server fails, 134 replication continues to be attempted until the node recovers. Nomad will 135 automatically try to reconnect to _failed_ nodes, allowing it to recover from 136 certain network conditions, while _left_ nodes are no longer contacted. 137 138 If an agent is operating as a server, a graceful leave is important to avoid 139 causing a potential availability outage affecting the 140 [consensus protocol](/docs/internals/consensus.html). If a server does 141 forcefully exit and will not be returning into service, the 142 [`server-force-leave` command](/docs/commands/server-force-leave.html) should 143 be used to force the server from a _failed_ to a _left_ state. 144 145 ## Next Steps 146 147 The development Nomad agent is up and running. Let's try to [run a job](jobs.html)!