github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/api-docs/operator/autopilot.mdx (about)

     1  ---
     2  layout: api
     3  page_title: Autopilot - Operator - HTTP API
     4  description: |-
     5    The /operator/autopilot endpoints provide tools for managing Autopilot.
     6  ---
     7  
     8  # Autopilot Operator HTTP API
     9  
    10  The `/operator/autopilot` endpoints allow for automatic operator-friendly
    11  management of Nomad servers including cleanup of dead servers, monitoring the
    12  state of the Raft cluster, and stable server introduction.
    13  
    14  ## Read Autopilot Configuration
    15  
    16  This endpoint retrieves its latest Autopilot configuration.
    17  
    18  | Method | Path                                   | Produces           |
    19  | ------ | -------------------------------------- | ------------------ |
    20  | `GET`  | `/v1/operator/autopilot/configuration` | `application/json` |
    21  
    22  The table below shows this endpoint's support for
    23  [blocking queries](/api-docs#blocking-queries) and
    24  [required ACLs](/api-docs#acls).
    25  
    26  | Blocking Queries | ACL Required    |
    27  | ---------------- | --------------- |
    28  | `NO`             | `operator:read` |
    29  
    30  ### Sample Request
    31  
    32  ```shell-session
    33  $ curl \
    34      https://localhost:4646/v1/operator/autopilot/configuration
    35  ```
    36  
    37  ### Sample Response
    38  
    39  ```json
    40  {
    41    "CleanupDeadServers": true,
    42    "LastContactThreshold": "200ms",
    43    "MaxTrailingLogs": 250,
    44    "ServerStabilizationTime": "10s",
    45    "EnableRedundancyZones": false,
    46    "DisableUpgradeMigration": false,
    47    "EnableCustomUpgrades": false,
    48    "CreateIndex": 4,
    49    "ModifyIndex": 4
    50  }
    51  ```
    52  
    53  For more information about the Autopilot configuration options, see the
    54  [agent configuration section](/docs/configuration/autopilot).
    55  
    56  ## Update Autopilot Configuration
    57  
    58  This endpoint updates the Autopilot configuration of the cluster.
    59  
    60  | Method | Path                                   | Produces           |
    61  | ------ | -------------------------------------- | ------------------ |
    62  | `PUT`  | `/v1/operator/autopilot/configuration` | `application/json` |
    63  
    64  The table below shows this endpoint's support for
    65  [blocking queries](/api-docs#blocking-queries) and
    66  [required ACLs](/api-docs#acls).
    67  
    68  | Blocking Queries | ACL Required     |
    69  | ---------------- | ---------------- |
    70  | `NO`             | `operator:write` |
    71  
    72  ### Parameters
    73  
    74  - `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will
    75    only happen if the given index matches the `ModifyIndex` of the configuration
    76    at the time of writing.
    77  
    78  ### Sample Payload
    79  
    80  ```json
    81  {
    82    "CleanupDeadServers": true,
    83    "LastContactThreshold": "200ms",
    84    "MaxTrailingLogs": 250,
    85    "ServerStabilizationTime": "10s",
    86    "EnableRedundancyZones": false,
    87    "DisableUpgradeMigration": false,
    88    "EnableCustomUpgrades": false,
    89    "CreateIndex": 4,
    90    "ModifyIndex": 4
    91  }
    92  ```
    93  
    94  - `CleanupDeadServers` `(bool: true)` - Specifies automatic removal of dead
    95    server nodes periodically and whenever a new server is added to the cluster.
    96  
    97  - `LastContactThreshold` `(string: "200ms")` - Specifies the maximum amount of
    98    time a server can go without contact from the leader before being considered
    99    unhealthy. Must be a duration value such as `10s`.
   100  
   101  - `MaxTrailingLogs` `(int: 250)` specifies the maximum number of log entries
   102    that a server can trail the leader by before being considered unhealthy.
   103  
   104  - `ServerStabilizationTime` `(string: "10s")` - Specifies the minimum amount of
   105    time a server must be stable in the 'healthy' state before being added to the
   106    cluster. Only takes effect if all servers are running Raft protocol version 3
   107    or higher. Must be a duration value such as `30s`.
   108  
   109  - `EnableRedundancyZones` `(bool: false)` - (Enterprise-only) Specifies whether
   110    to enable redundancy zones.
   111  
   112  - `DisableUpgradeMigration` `(bool: false)` - (Enterprise-only) Disables Autopilot's
   113    upgrade migration strategy in Nomad Enterprise of waiting until enough
   114    newer-versioned servers have been added to the cluster before promoting any of
   115    them to voters.
   116  
   117  - `EnableCustomUpgrades` `(bool: false)` - (Enterprise-only) Specifies whether to
   118    enable using custom upgrade versions when performing migrations.
   119  
   120  ## Read Health
   121  
   122  This endpoint queries the health of the autopilot status.
   123  
   124  | Method | Path                            | Produces           |
   125  | ------ | ------------------------------- | ------------------ |
   126  | `GET`  | `/v1/operator/autopilot/health` | `application/json` |
   127  
   128  The table below shows this endpoint's support for
   129  [blocking queries](/api-docs#blocking-queries) and
   130  [required ACLs](/api-docs#acls).
   131  
   132  | Blocking Queries | ACL Required    |
   133  | ---------------- | --------------- |
   134  | `NO`             | `operator:read` |
   135  
   136  ### Sample Request
   137  
   138  ```shell-session
   139  $ curl \
   140      https://localhost:4646/v1/operator/autopilot/health
   141  ```
   142  
   143  ### Sample response
   144  
   145  ```json
   146  {
   147    "Healthy": true,
   148    "FailureTolerance": 0,
   149    "Servers": [
   150      {
   151        "ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
   152        "Name": "node1",
   153        "Address": "127.0.0.1:4647",
   154        "SerfStatus": "alive",
   155        "Version": "0.8.0",
   156        "Leader": true,
   157        "LastContact": "0s",
   158        "LastTerm": 2,
   159        "LastIndex": 46,
   160        "Healthy": true,
   161        "Voter": true,
   162        "StableSince": "2017-03-06T22:07:51Z"
   163      },
   164      {
   165        "ID": "e36ee410-cc3c-0a0c-c724-63817ab30303",
   166        "Name": "node2",
   167        "Address": "127.0.0.1:4747",
   168        "SerfStatus": "alive",
   169        "Version": "0.8.0",
   170        "Leader": false,
   171        "LastContact": "27.291304ms",
   172        "LastTerm": 2,
   173        "LastIndex": 46,
   174        "Healthy": true,
   175        "Voter": false,
   176        "StableSince": "2017-03-06T22:18:26Z"
   177      }
   178    ]
   179  }
   180  ```
   181  
   182  - `Healthy` is whether all the servers are currently healthy.
   183  
   184  - `FailureTolerance` is the number of redundant healthy servers that could be
   185    fail without causing an outage (this would be 2 in a healthy cluster of 5
   186    servers).
   187  
   188  - `Servers` holds detailed health information on each server:
   189  
   190    - `ID` is the Raft ID of the server.
   191  
   192    - `Name` is the node name of the server.
   193  
   194    - `Address` is the address of the server.
   195  
   196    - `SerfStatus` is the SerfHealth check status for the server.
   197  
   198    - `Version` is the Nomad version of the server.
   199  
   200    - `Leader` is whether this server is currently the leader.
   201  
   202    - `LastContact` is the time elapsed since this server's last contact with the leader.
   203  
   204    - `LastTerm` is the server's last known Raft leader term.
   205  
   206    - `LastIndex` is the index of the server's last committed Raft log entry.
   207  
   208    - `Healthy` is whether the server is healthy according to the current Autopilot configuration.
   209  
   210    - `Voter` is whether the server is a voting member of the Raft cluster.
   211  
   212    - `StableSince` is the time this server has been in its current `Healthy` state.
   213  
   214    The HTTP status code will indicate the health of the cluster. If `Healthy` is true, then a
   215    status of 200 will be returned. If `Healthy` is false, then a status of 429 will be returned.