github.com/outbrain/consul@v1.4.5/website/source/docs/internals/anti-entropy.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Anti-Entropy"
     4  sidebar_current: "docs-internals-anti-entropy"
     5  description: >
     6    This section details the process and use of anti-entropy in Consul.
     7  ---
     8  
     9  # Anti-Entropy
    10  
    11  Consul uses an advanced method of maintaining service and health information.
    12  This page details how services and checks are registered, how the catalog is
    13  populated, and how health status information is updated as it changes.
    14  
    15  ~> **Advanced Topic!** This page covers technical details of
    16  the internals of Consul. You don't need to know these details to effectively
    17  operate and use Consul. These details are documented here for those who wish
    18  to learn about them without having to go spelunking through the source code.
    19  
    20  ### Components
    21  
    22  It is important to first understand the moving pieces involved in services and
    23  health checks: the [agent](#agent) and the [catalog](#catalog). These are
    24  described conceptually below to make anti-entropy easier to understand.
    25  
    26  <a name="agent"></a>
    27  #### Agent
    28  
    29  Each Consul agent maintains its own set of service and check registrations as
    30  well as health information. The agents are responsible for executing their own
    31  health checks and updating their local state.
    32  
    33  Services and checks within the context of an agent have a rich set of
    34  configuration options available. This is because the agent is responsible for
    35  generating information about its services and their health through the use of
    36  [health checks](/docs/agent/checks.html).
    37  
    38  <a name="catalog"></a>
    39  #### Catalog
    40  
    41  Consul's service discovery is backed by a service catalog. This catalog is
    42  formed by aggregating information submitted by the agents. The catalog maintains
    43  the high-level view of the cluster, including which services are available,
    44  which nodes run those services, health information, and more. The catalog is
    45  used to expose this information via the various interfaces Consul provides,
    46  including DNS and HTTP.
    47  
    48  Services and checks within the context of the catalog have a much more limited
    49  set of fields when compared with the agent. This is because the catalog is only
    50  responsible for recording and returning information *about* services, nodes, and
    51  health.
    52  
    53  The catalog is maintained only by server nodes. This is because the catalog is
    54  replicated via the [Raft log](/docs/internals/consensus.html) to provide a
    55  consolidated and consistent view of the cluster.
    56  
    57  <a name="anti-entropy"></a>
    58  ### Anti-Entropy
    59  
    60  Entropy is the tendency of systems to become increasingly disordered. Consul's
    61  anti-entropy mechanisms are designed to counter this tendency, to keep the
    62  state of the cluster ordered even through failures of its components.
    63  
    64  Consul has a clear separation between the global service catalog and the agent's
    65  local state as discussed above. The anti-entropy mechanism reconciles these two
    66  views of the world: anti-entropy is a synchronization of the local agent state and
    67  the catalog. For example, when a user registers a new service or check with the
    68  agent, the agent in turn notifies the catalog that this new check exists.
    69  Similarly, when a check is deleted from the agent, it is consequently removed from
    70  the catalog as well.
    71  
    72  Anti-entropy is also used to update availability information. As agents run
    73  their health checks, their status may change in which case their new status
    74  is synced to the catalog. Using this information, the catalog can respond
    75  intelligently to queries about its nodes and services based on their
    76  availability.
    77  
    78  During this synchronization, the catalog is also checked for correctness. If
    79  any services or checks exist in the catalog that the agent is not aware of, they
    80  will be automatically removed to make the catalog reflect the proper set of
    81  services and health information for that agent. Consul treats the state of the
    82  agent as authoritative; if there are any differences between the agent
    83  and catalog view, the agent-local view will always be used.
    84  
    85  ### Periodic Synchronization
    86  
    87  In addition to running when changes to the agent occur, anti-entropy is also a
    88  long-running process which periodically wakes up to sync service and check
    89  status to the catalog. This ensures that the catalog closely matches the agent's
    90  true state. This also allows Consul to re-populate the service catalog even in
    91  the case of complete data loss.
    92  
    93  To avoid saturation, the amount of time between periodic anti-entropy runs will
    94  vary based on cluster size. The table below defines the relationship between
    95  cluster size and sync interval:
    96  
    97  <table class="table table-bordered table-striped">
    98    <tr>
    99      <th>Cluster Size</th>
   100      <th>Periodic Sync Interval</th>
   101    </tr>
   102    <tr>
   103      <td>1 - 128</td>
   104      <td>1 minute</td>
   105    </tr>
   106    <tr>
   107      <td>129 - 256</td>
   108      <td>2 minutes</td>
   109    </tr>
   110    <tr>
   111      <td>257 - 512</td>
   112      <td>3 minutes</td>
   113    </tr>
   114    <tr>
   115      <td>513 - 1024</td>
   116      <td>4 minutes</td>
   117    </tr>
   118    <tr>
   119      <td>...</td>
   120      <td>...</td>
   121    </tr>
   122  </table>
   123  
   124  The intervals above are approximate. Each Consul agent will choose a randomly
   125  staggered start time within the interval window to avoid a thundering herd.
   126  
   127  ### Best-effort sync
   128  
   129  Anti-entropy can fail in a number of cases, including misconfiguration of the
   130  agent or its operating environment, I/O problems (full disk, filesystem
   131  permission, etc.), networking problems (agent cannot communicate with server),
   132  among others. Because of this, the agent attempts to sync in best-effort
   133  fashion.
   134  
   135  If an error is encountered during an anti-entropy run, the error is logged and
   136  the agent continues to run. The anti-entropy mechanism is run periodically to
   137  automatically recover from these types of transient failures.
   138  
   139  ### Enable Tag Override
   140  
   141  Synchronization of service registration can be partially modified to
   142  allow external agents to change the tags for a service. This can be
   143  useful in situations where an external monitoring service needs to be
   144  the source of truth for tag information. For example, the Redis
   145  database and its monitoring service Redis Sentinel have this kind of
   146  relationship. Redis instances are responsible for much of their
   147  configuration, but Sentinels determine whether the Redis instance is a
   148  primary or a secondary. Using the Consul service configuration item
   149  [enable_tag_override](/docs/agent/services.html) you can instruct the
   150  Consul agent on which the Redis database is running to NOT update the
   151  tags during anti-entropy synchronization. For more information see
   152  [Services](/docs/agent/services.html#enable-tag-override-and-anti-entropy) page.