github.com/uchennaokeke444/nomad@v0.11.8/website/pages/intro/getting-started/running.mdx (about) 1 --- 2 layout: intro 3 page_title: Running Nomad 4 sidebar_title: Running Nomad 5 description: 'Learn about the Nomad agent, and the lifecycle of running and stopping.' 6 --- 7 8 # Running Nomad 9 10 Nomad relies on a long running agent on every machine in the cluster. 11 The agent can run either in server or client mode. Each region must 12 have at least one server, though a cluster of 3 or 5 servers is recommended. 13 A single server deployment is _**highly**_ discouraged as data loss is inevitable 14 in a failure scenario. 15 16 All other agents run in client mode. A Nomad client is a very lightweight 17 process that registers the host machine, performs heartbeating, and runs the tasks 18 that are assigned to it by the servers. The agent must be run on every node that 19 is part of the cluster so that the servers can assign work to those machines. 20 21 ## Starting the Agent 22 23 For simplicity, we will run a single Nomad agent in development mode. This mode 24 is used to quickly start an agent that is acting as a client and server to test 25 job configurations or prototype interactions. It should _**not**_ be used in 26 production as it does not persist state. 27 28 ```shell-session 29 $ sudo nomad agent -dev 30 31 ==> Starting Nomad agent... 32 ==> Nomad agent configuration: 33 34 Client: true 35 Log Level: DEBUG 36 Region: global (DC: dc1) 37 Server: true 38 39 ==> Nomad agent started! Log data will stream in below: 40 41 [INFO] serf: EventMemberJoin: nomad.global 127.0.0.1 42 [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core] 43 [INFO] client: using alloc directory /tmp/NomadClient599911093 44 [INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state 45 [INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1) 46 [WARN] fingerprint.network: Ethtool not found, checking /sys/net speed file 47 [WARN] raft: Heartbeat timeout reached, starting election 48 [INFO] raft: Node at 127.0.0.1:4647 [Candidate] entering Candidate state 49 [DEBUG] raft: Votes needed: 1 50 [DEBUG] raft: Vote granted. Tally: 1 51 [INFO] raft: Election won. Tally: 1 52 [INFO] raft: Node at 127.0.0.1:4647 [Leader] entering Leader state 53 [INFO] raft: Disabling EnableSingleNode (bootstrap) 54 [DEBUG] raft: Node 127.0.0.1:4647 updated peer set (2): [127.0.0.1:4647] 55 [INFO] nomad: cluster leadership acquired 56 [DEBUG] client: applied fingerprints [arch cpu host memory storage network] 57 [DEBUG] client: available drivers [docker exec java] 58 [DEBUG] client: node registration complete 59 [DEBUG] client: updated allocations at index 1 (0 allocs) 60 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0) 61 [DEBUG] client: state updated to ready 62 ``` 63 64 As you can see, the Nomad agent has started and has output some log 65 data. From the log data, you can see that our agent is running in both 66 client and server mode, and has claimed leadership of the cluster. 67 Additionally, the local client has been registered and marked as ready. 68 69 -> **Note:** Typically any agent running in client mode must be run with root level 70 privilege. Nomad makes use of operating system primitives for resource isolation 71 which require elevated permissions. The agent will function as non-root, but 72 certain task drivers will not be available. 73 74 ## Cluster Nodes 75 76 If you run [`nomad node status`](/docs/commands/node/status) in another 77 terminal, you can see the registered nodes of the Nomad cluster: 78 79 ```shell-session 80 $ nomad node status 81 ID DC Name Class Drain Eligibility Status 82 171a583b dc1 nomad <none> false eligible ready 83 ``` 84 85 The output shows our Node ID, which is a randomly generated UUID, 86 its datacenter, node name, node class, drain mode and current status. 87 We can see that our node is in the ready state, and task draining is 88 currently off. 89 90 The agent is also running in server mode, which means it is part of 91 the [gossip protocol](/docs/internals/gossip) used to connect all 92 the server instances together. We can view the members of the gossip 93 ring using the [`server members`](/docs/commands/server/members) command: 94 95 ```shell-session 96 $ nomad server members 97 Name Address Port Status Leader Protocol Build Datacenter Region 98 nomad.global 127.0.0.1 4648 alive true 2 0.7.0 dc1 global 99 ``` 100 101 The output shows our own agent, the address it is running on, its 102 health state, some version information, and the datacenter and region. 103 Additional metadata can be viewed by providing the `-detailed` flag. 104 105 ## Stopping the Agent ((#stopping)) 106 107 You can use `Ctrl-C` (the interrupt signal) to halt the agent. 108 By default, all signals will cause the agent to forcefully shutdown. 109 The agent [can be configured](/docs/configuration#leave_on_terminate) to 110 gracefully leave on either the interrupt or terminate signals. 111 112 After interrupting the agent, you should see it leave the cluster 113 and shut down: 114 115 ``` 116 ^C==> Caught signal: interrupt 117 [DEBUG] http: Shutting down http server 118 [INFO] agent: requesting shutdown 119 [INFO] client: shutting down 120 [INFO] nomad: shutting down server 121 [WARN] serf: Shutdown without a Leave 122 [INFO] agent: shutdown complete 123 ``` 124 125 By gracefully leaving, Nomad clients update their status to prevent 126 further tasks from being scheduled and to start migrating any tasks that are 127 already assigned. Nomad servers notify their peers they intend to leave. 128 When a server leaves, replication to that server stops. If a server fails, 129 replication continues to be attempted until the node recovers. Nomad will 130 automatically try to reconnect to _failed_ nodes, allowing it to recover from 131 certain network conditions, while _left_ nodes are no longer contacted. 132 133 If an agent is operating as a server, [`leave_on_terminate`](/docs/configuration#leave_on_terminate) should only 134 be set if the server will never rejoin the cluster again. The default value of `false` for `leave_on_terminate` and `leave_on_interrupt` 135 work well for most scenarios. If Nomad servers are part of an auto scaling group where new servers are brought up to replace 136 failed servers, using graceful leave avoids causing a potential availability outage affecting the [consensus protocol](/docs/internals/consensus). 137 As of Nomad 0.8, Nomad includes Autopilot which automatically removes failed or dead servers. This allows the operator to skip setting `leave_on_terminate`. 138 139 If a server does forcefully exit and will not be returning into service, the 140 [`server force-leave` command](/docs/commands/server/force-leave) should 141 be used to force the server from a _failed_ to a _left_ state. 142 143 ## Next Steps 144 145 If you shut down the development Nomad agent as instructed above, ensure that it is back up and running again and let's try to [run a job](/intro/getting-started/jobs)!