github.com/outbrain/consul@v1.4.5/website/source/docs/agent/basics.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Agent"
     4  sidebar_current: "docs-agent-running"
     5  description: |-
     6    The Consul agent is the core process of Consul. The agent maintains membership information, registers services, runs checks, responds to queries, and more. The agent must run on every node that is part of a Consul cluster.
     7  ---
     8  
     9  # Consul Agent
    10  
    11  The Consul agent is the core process of Consul. The agent maintains membership
    12  information, registers services, runs checks, responds to queries,
    13  and more. The agent must run on every node that is part of a Consul cluster.
    14  
    15  Any agent may run in one of two modes: client or server. A server
    16  node takes on the additional responsibility of being part of the [consensus quorum](/docs/internals/consensus.html).
    17  These nodes take part in Raft and provide strong consistency and availability in
    18  the case of failure. The higher burden on the server nodes means that usually they
    19  should be run on dedicated instances -- they are more resource intensive than a client
    20  node. Client nodes make up the majority of the cluster, and they are very lightweight
    21  as they interface with the server nodes for most operations and maintain very little state
    22  of their own.
    23  
    24  ## Running an Agent
    25  
    26  The agent is started with the [`consul agent`](/docs/commands/agent.html) command. This
    27  command blocks, running forever or until told to quit. The agent command takes a variety
    28  of [`configuration options`](/docs/agent/options.html#command-line-options), but most have sane defaults.
    29  
    30  When running [`consul agent`](/docs/commands/agent.html), you should see output similar to this:
    31  
    32  ```text
    33  $ consul agent -data-dir=/tmp/consul
    34  ==> Starting Consul agent...
    35  ==> Consul agent running!
    36         Node name: 'Armons-MacBook-Air'
    37        Datacenter: 'dc1'
    38            Server: false (bootstrap: false)
    39       Client Addr: 127.0.0.1 (HTTP: 8500, DNS: 8600)
    40      Cluster Addr: 192.168.1.43 (LAN: 8301, WAN: 8302)
    41  
    42  ==> Log data will now stream in as it occurs:
    43  
    44      [INFO] serf: EventMemberJoin: Armons-MacBook-Air.local 192.168.1.43
    45  ...
    46  ```
    47  
    48  There are several important messages that [`consul agent`](/docs/commands/agent.html) outputs:
    49  
    50  * **Node name**: This is a unique name for the agent. By default, this
    51    is the hostname of the machine, but you may customize it using the
    52    [`-node`](/docs/agent/options.html#_node) flag.
    53  
    54  * **Datacenter**: This is the datacenter in which the agent is configured to run.
    55   Consul has first-class support for multiple datacenters; however, to work efficiently,
    56   each node must be configured to report its datacenter. The [`-datacenter`](/docs/agent/options.html#_datacenter)
    57   flag can be used to set the datacenter. For single-DC configurations, the agent
    58   will default to "dc1".
    59  
    60  * **Server**: This indicates whether the agent is running in server or client mode.
    61    Server nodes have the extra burden of participating in the consensus quorum,
    62    storing cluster state, and handling queries. Additionally, a server may be
    63    in ["bootstrap"](/docs/agent/options.html#_bootstrap_expect) mode. Multiple servers
    64    cannot be in bootstrap mode as that would put the cluster in an inconsistent state.
    65  
    66  * **Client Addr**: This is the address used for client interfaces to the agent.
    67    This includes the ports for the HTTP and DNS interfaces. By default, this binds only
    68    to localhost. If you change this address or port, you'll have to specify a `-http-addr`
    69    whenever you run commands such as [`consul members`](/docs/commands/members.html) to
    70    indicate how to reach the agent. Other applications can also use the HTTP address and port
    71    [to control Consul](/api/index.html).
    72  
    73  * **Cluster Addr**: This is the address and set of ports used for communication between
    74    Consul agents in a cluster. Not all Consul agents in a cluster have to
    75    use the same port, but this address **MUST** be reachable by all other nodes.
    76  
    77  When running under `systemd` on Linux, Consul notifies systemd by sending
    78  `READY=1` to the `$NOTIFY_SOCKET` when a LAN join has completed. For
    79  this either the `join` or `retry_join` option has to be set and the
    80  service definition file has to have `Type=notify` set.
    81  
    82  ## Stopping an Agent
    83  
    84  An agent can be stopped in two ways: gracefully or forcefully. To gracefully
    85  halt an agent, send the process an interrupt signal (usually
    86  `Ctrl-C` from a terminal or running `kill -INT consul_pid` ). When gracefully exiting, the agent first notifies
    87  the cluster it intends to leave the cluster. This way, other cluster members
    88  notify the cluster that the node has _left_.
    89  
    90  Alternatively, you can force kill the agent by sending it a kill signal.
    91  When force killed, the agent ends immediately. The rest of the cluster will
    92  eventually (usually within seconds) detect that the node has died and
    93  notify the cluster that the node has _failed_.
    94  
    95  It is especially important that a server node be allowed to leave gracefully
    96  so that there will be a minimal impact on availability as the server leaves
    97  the consensus quorum.
    98  
    99  For client agents, the difference between a node _failing_ and a node _leaving_
   100  may not be important for your use case. For example, for a web server and load
   101  balancer setup, both result in the same outcome: the web node is removed
   102  from the load balancer pool.
   103  
   104  ## Lifecycle
   105  
   106  Every agent in the Consul cluster goes through a lifecycle. Understanding
   107  this lifecycle is useful for building a mental model of an agent's interactions
   108  with a cluster and how the cluster treats a node.
   109  
   110  When an agent is first started, it does not know about any other node in the cluster.
   111  To discover its peers, it must _join_ the cluster. This is done with the
   112  [`join`](/docs/commands/join.html)
   113  command or by providing the proper configuration to auto-join on start. Once a node
   114  joins, this information is gossiped to the entire cluster, meaning all nodes will
   115  eventually be aware of each other. If the agent is a server, existing servers will
   116  begin replicating to the new node.
   117  
   118  In the case of a network failure, some nodes may be unreachable by other nodes.
   119  In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish
   120  between a network failure and an agent crash, so both cases are handled the same.
   121  Once a node is marked as failed, this information is updated in the service catalog.
   122  
   123  -> **Note:** There is some nuance here since this update is only possible if the servers can still [form a quorum](/docs/internals/consensus.html). Once the network recovers or a crashed agent restarts the cluster will repair itself and unmark a node as failed. The health check in the catalog will also be updated to reflect this.
   124  
   125  When a node _leaves_, it specifies its intent to do so, and the cluster
   126  marks that node as having _left_. Unlike the _failed_ case, all of the
   127  services provided by a node are immediately deregistered. If the agent was
   128  a server, replication to it will stop.
   129  
   130  To prevent an accumulation of dead nodes (nodes in either _failed_ or _left_
   131  states), Consul will automatically remove dead nodes out of the catalog. This
   132  process is called _reaping_. This is currently done on a configurable
   133  interval of 72 hours (changing the reap interval is *not* recommended due to
   134  its consequences during outage situations). Reaping is similar to leaving,
   135  causing all associated services to be deregistered.