github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/install/production/nomad-agent.mdx (about) 1 --- 2 layout: docs 3 page_title: Nomad Agent 4 sidebar_title: Set Server & Client Nodes 5 description: |- 6 The Nomad agent is a long running process which can be used either in 7 a client or server mode. 8 --- 9 10 # Setting Nodes with Nomad Agent 11 12 The Nomad agent is a long running process which runs on every machine that 13 is part of the Nomad cluster. The behavior of the agent depends on if it is 14 running in client or server mode. Clients are responsible for running tasks, 15 while servers are responsible for managing the cluster. 16 17 Client mode agents are relatively simple. They make use of fingerprinting 18 to determine the capabilities and resources of the host machine, as well as 19 determining what [drivers](/docs/drivers) are available. Clients 20 register with servers to provide the node information, heartbeat to provide 21 liveness, and run any tasks assigned to them. 22 23 Servers take on the responsibility of being part of the 24 [consensus protocol](/docs/internals/consensus) and [gossip protocol](/docs/internals/gossip). 25 The consensus protocol, powered by Raft, allows the servers to perform 26 leader election and state replication. The gossip protocol allows for simple 27 clustering of servers and multi-region federation. The higher burden on the 28 server nodes means that usually they should be run on dedicated instances -- 29 they are more resource intensive than a client node. 30 31 Client nodes make up the majority of the cluster, and are very lightweight as 32 they interface with the server nodes and maintain very little state of their 33 own. Each cluster has usually 3 or 5 server mode agents and potentially 34 thousands of clients. 35 36 ## Running an Agent 37 38 The agent is started with the [`nomad agent` command](/docs/commands/agent). This 39 command blocks, running forever or until told to quit. The agent command takes a variety 40 of configuration options, but most have sane defaults. 41 42 When running `nomad agent`, you should see output similar to this: 43 44 ```shell-sessionnomad agent -dev 45 ==> Starting Nomad agent... 46 ==> Nomad agent configuration: 47 48 Client: true 49 Log Level: INFO 50 Region: global (DC: dc1) 51 Server: true 52 53 ==> Nomad agent started! Log data will stream in below: 54 55 [INFO] serf: EventMemberJoin: server-1.node.global 127.0.0.1 56 [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core] 57 ... 58 ``` 59 60 There are several important messages that `nomad agent` outputs: 61 62 - **Client**: This indicates whether the agent has enabled client mode. 63 Client nodes fingerprint their host environment, register with servers, 64 and run tasks. 65 66 - **Log Level**: This indicates the configured log level. Only messages with 67 an equal or higher severity will be logged. This can be tuned to increase 68 verbosity for debugging, or reduced to avoid noisy logging. 69 70 - **Region**: This is the region and datacenter in which the agent is configured 71 to run. Nomad has first-class support for multi-datacenter and multi-region 72 configurations. The `-region` and `-dc` flags can be used to set the region 73 and datacenter. The default is the `global` region in `dc1`. 74 75 - **Server**: This indicates whether the agent has enabled server mode. 76 Server nodes have the extra burden of participating in the consensus protocol, 77 storing cluster state, and making scheduling decisions. 78 79 ## Stopping an Agent 80 81 An agent can be stopped in two ways: gracefully or forcefully. By default, 82 any signal to an agent (interrupt, terminate, kill) will cause the agent 83 to forcefully stop. Graceful termination can be configured by either 84 setting `leave_on_interrupt` or `leave_on_terminate` to respond to the 85 respective signals. 86 87 When gracefully exiting, clients will update their status to terminal on 88 the servers so that tasks can be migrated to healthy agents. Servers 89 will notify their intention to leave the cluster which allows them to 90 leave the [consensus](/docs/internals/consensus) peer set. 91 92 It is especially important that a server node be allowed to leave gracefully 93 so that there will be a minimal impact on availability as the server leaves 94 the consensus peer set. If a server does not gracefully leave, and will not 95 return into service, the [`server force-leave` command](/docs/commands/server/force-leave) 96 should be used to eject it from the consensus peer set. 97 98 ## Lifecycle 99 100 Every agent in the Nomad cluster goes through a lifecycle. Understanding 101 this lifecycle is useful for building a mental model of an agent's interactions 102 with a cluster and how the cluster treats a node. 103 104 When a client agent is first started, it fingerprints the host machine to 105 identify its attributes, capabilities, and [task drivers](/docs/drivers). 106 These are reported to the servers during an initial registration. The addresses 107 of known servers are provided to the agent via configuration, potentially using 108 DNS for resolution. Using [Consul](https://www.consul.io) provides a way to avoid hard 109 coding addresses and resolving them on demand. 110 111 While a client is running, it is performing heartbeating with servers to 112 maintain liveness. If the heartbeats fail, the servers assume the client node 113 has failed, and stop assigning new tasks while migrating existing tasks. 114 It is impossible to distinguish between a network failure and an agent crash, 115 so both cases are handled the same. Once the network recovers or a crashed agent 116 restarts the node status will be updated and normal operation resumed. 117 118 To prevent an accumulation of nodes in a terminal state, Nomad does periodic 119 garbage collection of nodes. By default, if a node is in a failed or 'down' 120 state for over 24 hours it will be garbage collected from the system. 121 122 Servers are slightly more complex as they perform additional functions. They 123 participate in a [gossip protocol](/docs/internals/gossip) both to cluster 124 within a region and to support multi-region configurations. When a server is 125 first started, it does not know the address of other servers in the cluster. 126 To discover its peers, it must _join_ the cluster. This is done with the 127 [`server join` command](/docs/commands/server/join) or by providing the 128 proper configuration on start. Once a node joins, this information is gossiped 129 to the entire cluster, meaning all nodes will eventually be aware of each other. 130 131 When a server _leaves_, it specifies its intent to do so, and the cluster marks that 132 node as having _left_. If the server has _left_, replication to it will stop and it 133 is removed from the consensus peer set. If the server has _failed_, replication 134 will attempt to make progress to recover from a software or network failure. 135 136 ## Permissions 137 138 Nomad servers should be run with the lowest possible permissions. Nomad clients 139 must be run as root due to the OS isolation mechanisms that require root 140 privileges. In all cases, it is recommended you create a `nomad` user with the 141 minimal set of required privileges.