github.com/outbrain/consul@v1.4.5/website/source/docs/upgrade-specific.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Upgrading Specific Versions" 4 sidebar_current: "docs-upgrading-specific" 5 description: |- 6 Specific versions of Consul may have additional information about the upgrade process beyond the standard flow. 7 --- 8 9 # Upgrading Specific Versions 10 11 The [upgrading page](/docs/upgrading.html) covers the details of doing 12 a standard upgrade. However, specific versions of Consul may have more 13 details provided for their upgrades as a result of new features or changed 14 behavior. This page is used to document those details separately from the 15 standard upgrade flow. 16 17 ## Consul 1.4.0 18 19 There are two major features in Consul 1.4.0 that may impact upgrades: a [new ACL system](#acl-upgrade) and [multi-datacenter support for Connect](#connect-multi-datacenter) in the Enterprise version. 20 21 ### ACL Upgrade 22 23 Consul 1.4.0 includes a [new ACL system](/docs/guides/acl.html) that is 24 designed to have a smooth upgrade path but requires care to upgrade components 25 in the right order. 26 27 **Note:** As with most major version upgrades, you cannot downgrade once the 28 upgrade to 1.4.0 is complete as it adds new state to the raft store. As always 29 it is _strongly_ recommended that you test the upgrade first outside of 30 production and ensure you take backup snapshots of all datacenters before 31 upgrading. 32 33 #### Primary Datacenter 34 35 The "ACL datacenter" in 1.3.x and earlier is now referred to as the "Primary 36 datacenter". All configuration is backwards compatible and shouldn't need to 37 change prior to upgrade although it's strongly recommended to migrate ACL 38 configuration to the new syntax soon after upgrade. This includes moving to 39 `primary_datacenter` rather than `acl_datacenter` and `acl_*` to the new [ACL 40 block](/docs/agent/options.html#acl). 41 42 Datacenters can be upgraded in any order although secondaries will remain in 43 [Legacy ACL mode](#legacy-acl-mode) until the primary datacenter is fully 44 ugraded. 45 46 Each datacenter should follow the [standard rolling upgrade 47 procedure](/docs/upgrading.html#standard-upgrades). 48 49 #### Legacy ACL Mode 50 51 When a 1.4.0 server first starts, it runs in "Legacy ACL mode". In this mode, 52 bootstrap requests and new ACL APIs will not be functional yet and will return 53 an error. The server advertises it's ability to support 1.4.0 ACLs via gossip 54 and waits. 55 56 In the primary datacenter, the servers all wait in legacy ACL mode until they 57 see every server in the primary datacenter advertise 1.4.0 ACL support. Once 58 this happens, the leader will complete the transition out of "legacy ACL mode" 59 and write this into the state so future restarts don't need to go through the 60 same transition. 61 62 In a secondary datacenter, the same process happens except that servers 63 _additionally_ wait for all servers in the primary datacenter making it safe to 64 upgrade datacenters in any order. 65 66 It should be noted that even if you are not upgrading, starting a brand new 67 1.4.0 cluster will transition through legacy ACL mode so you may be unable to 68 bootstrap ACLs until all the expected servers are up and healthy. 69 70 #### Legacy Token Accessor Migration 71 72 As soon as all servers in the primary datacenter have been upgraded to 1.4.0, 73 the leader will begin the process of creating new accessor IDs for all existing 74 ACL tokens. 75 76 This process completes in the background and is rate limited to ensure it 77 doesn't overload the leader. It completes upgrades in batches of 128 tokens and 78 will not upgrade more than one batch per second so on a cluster with 10,000 79 tokens, this may take several minutes. 80 81 While this is happening both old and new ACLs will work correctly with the 82 caveat that new ACL [Token APIs](/api/acl/tokens.html) may not return an 83 accessor ID for legacy tokens that are not yet migrated. 84 85 #### Migrating Existing ACLs 86 87 New ACL policies have slightly different syntax designed to fix some 88 shortcomings in old ACL syntax. During and after the upgrade process, any old 89 ACL tokens will continue to work and grant exactly the same level of access. 90 91 After upgrade, it is still possible to create "legacy" tokens using the existing 92 API so existing integrations that create tokens (e.g. Vault) will continue to 93 work. The "legacy" tokens generated though will not be able to take advantage of 94 new policy features. It's recommended that you complete migration of all tokens 95 as soon as possible after upgrade, as well as updating any integrations to work 96 with the the new ACL [Token](/api/acl/tokens.html) and 97 [Policy](/api/acl/policies.html) APIs. 98 99 More complete details on how to upgrade "legacy" tokens is available [here](/docs/guides/acl-migrate-tokens.html). 100 101 ### Connect Multi-datacenter 102 103 This only applies to users upgrading from an older version of Consul Enterprise to Consul Enterprise 1.4.0 (all license types). 104 105 In addition, this upgrade will only affect clusters where [Connect is enabled](/docs/connect/configuration.html) on your servers before the migration. 106 107 Connect multi-datacenter uses the same primary/secondary approach as ACLs and will use the same [primary_datacenter](#primary-datacenter). When a secondary datacenter server restarts with 1.4.0 it will detect it is not the primary and begin an automatic bootstrap of multi-datacenter CA federation. 108 109 Datacenters can be upgraded in either order; secondary datacenters will not switch into multi-datacenter mode until all servers in both the secondary and primary datacenter are detected to be running at least Consul 1.4.0. Secondary datacenters monitor this periodically (every few minutes) and will automatically upgrade Connect to use a federated Certificate Authority when they do. 110 111 In general, migrating a Consul cluster from OSS to Enterprise will update the CA to be federated automatically and without impact on Connect traffic. When upgrading Consul Enterprise 1.3.x to Consul Enterprise 1.4.0 upgrades the CA upgrade is seamless, however depending on the size of the cluster, _new_ connection attempts in the secondary datacenter might fail for a short window (typically seconds) while the update is propagated due to the 1.3.x Beta authorization endpoint validating originating cluster in a way that was not fully forwards compatible with migrating between cluster trust domains. That issue is fixed in 1.4.0 as part of General Availability. 112 113 Once migrated (typically a few seconds). Connect will use the primary datacenter's Certificate Authority as the root of trust for all other datacenters. CA migration or root key changes in the primary will now rotate automatically and without loss of connectivity throughout all datacenters and workloads. 114 115 For more information see [Connect Multi-datacenter](/docs/enterprise/connect-multi-datacenter/index.html). 116 117 ## Consul 1.3.0 118 119 This version added support for multiple tag filters in service discovery queries, however it introduced a subtle bug where API calls to `/catalog/service/:name?tag=<tag>` would ignore the tag filter _only during the upgrade_. It only occurs when clients are still running 1.2.3 or earlier but servers have been upgraded. The `/health/service/:name?tag=<tag>` endpoint and DNS interface were _not_ affected. 120 121 For this reason, we recommend you upgrade directly to 1.3.1 which includes only a fix for this issue. 122 123 ## Consul 1.1.0 124 125 #### Removal of Deprecated Features 126 127 The following previously deprecated fields and config options have been removed: 128 129 - `CheckID` has been removed from config file check definitions (use `id` instead). 130 - `script` has been removed from config file check definitions (use `args` instead). 131 - `enableTagOverride` is no longer valid in service definitions (use `enable_tag_override` instead). 132 - The [deprecated set of metric names](/docs/upgrade-specific.html#metric-names-updated) (beginning with `consul.consul.`) has been removed 133 along with the `enable_deprecated_names` option from the metrics configuration. 134 135 #### New defaults for Raft Snapshot Creation 136 Consul 1.0.1 (and earlier versions of Consul) checked for raft snapshots every 137 5 seconds, and created new snapshots for every 8192 writes. These defaults cause 138 constant disk IO in large busy clusters. Consul 1.1.0 increases these to larger values, 139 and makes them tunable via the [raft_snapshot_interval](/docs/agent/options.html#_raft_snapshot_interval) and 140 [raft_snapshot_threshold](/docs/agent/options.html#_raft_snapshot_threshold) parameters. We recommend 141 keeping the new defaults. However, operators can go back to the old defaults by changing their 142 config if they prefer more frequent snapshots. See the documentation for [raft_snapshot_interval](/docs/agent/options.html#_raft_snapshot_interval) 143 and [raft_snapshot_threshold](/docs/agent/options.html#_raft_snapshot_threshold) to understand the trade-offs 144 when tuning these. 145 146 ## Consul 1.0.7 147 148 When requesting a specific service (`/v1/health/:service` or 149 `/v1/catalog/:service` endpoints), the `X-Consul-Index` returned is now the 150 index at which that _specific service_ was last modified. In version 1.0.6 and 151 earlier the `X-Consul-Index` returned was the index at which _any_ service was 152 last modified. See [GH-3890](https://github.com/hashicorp/consul/issues/3890) 153 for more details. 154 155 During upgrades from 1.0.6 or lower to 1.0.7 or higher, watchers are likely to 156 see `X-Consul-Index` for these endpoints decrease between blocking calls. 157 158 Consul’s watch feature and `consul-template` should gracefully handle this case. 159 Other tools relying on blocking service or health queries are also likely to 160 work; some may require a restart. It is possible external tools could break and 161 either stop working or continually re-request data without blocking if they 162 have assumed indexes can never decrease or be reset and/or persist index 163 values. Please test any blocking query integrations in a controlled environment 164 before proceeding. 165 166 ## Consul 1.0.1 167 168 #### Carefully Check and Remove Stale Servers During Rolling Upgrades 169 170 Consul 1.0 (and earlier versions of Consul when running with [Raft protocol 3](/docs/agent/options.html#_raft_protocol) had an issue where performing rolling updates of Consul servers could result in an outage from old servers remaining in the cluster. [Autopilot](/docs/guides/autopilot.html) would normally remove old servers when new ones come online, but it was also waiting to promote servers to voters in pairs to maintain an odd quorum size. The pairwise promotion feature was removed so that servers become voters as soon as they are stable, allowing Autopilot to remove old servers in a safer way. 171 172 When upgrading from Consul 1.0, you may need to manually [force-leave](/docs/commands/force-leave.html) old servers as part of a rolling update to Consul 1.0.1. 173 174 ## Consul 1.0 175 176 Consul 1.0 has several important breaking changes that are documented here. Please be sure to read over all the details here before upgrading. 177 178 #### Raft Protocol Now Defaults to 3 179 180 The [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) default has been changed from 2 to 3, enabling all [Autopilot](/docs/guides/autopilot.html) features by default. 181 182 Raft protocol version 3 requires Consul running 0.8.0 or newer on all servers in order to work, so if you are upgrading with older servers in a cluster then you will need to set this back to 2 in order to upgrade. See [Raft Protocol Version Compatibility](/docs/upgrade-specific.html#raft-protocol-version-compatibility) for more details. Also the format of `peers.json` used for outage recovery is different when running with the latest Raft protocol. See [Manual Recovery Using peers.json](/docs/guides/outage.html#manual-recovery-using-peers-json) for a description of the required format. 183 184 Please note that the Raft protocol is different from Consul's internal protocol as described on the [Protocol Compatibility Promise](/docs/compatibility.html) page, and as is shown in commands like `consul members` and `consul version`. To see the version of the Raft protocol in use on each server, use the `consul operator raft list-peers` command. 185 186 The easiest way to upgrade servers is to have each server leave the cluster, upgrade its Consul version, and then add it back. Make sure the new server joins successfully and that the cluster is stable before rolling the upgrade forward to the next server. It's also possible to stand up a new set of servers, and then slowly stand down each of the older servers in a similar fashion. 187 188 When using Raft protocol version 3, servers are identified by their [`-node-id`](/docs/agent/options.html#_node_id) instead of their IP address when Consul makes changes to its internal Raft quorum configuration. This means that once a cluster has been upgraded with servers all running Raft protocol version 3, it will no longer allow servers running any older Raft protocol versions to be added. If running a single Consul server, restarting it in-place will result in that server not being able to elect itself as a leader. To avoid this, either set the Raft protocol back to 2, or use [Manual Recovery Using peers.json](/docs/guides/outage.html#manual-recovery-using-peers-json) to map the server to its node ID in the Raft quorum configuration. 189 190 #### Config Files Require an Extension 191 192 As part of supporting the [HCL](https://github.com/hashicorp/hcl#syntax) format for Consul's config files, an `.hcl` or `.json` extension is required for all config files loaded by Consul, even when using the [`-config-file`](/docs/agent/options.html#_config_file) argument to specify a file directly. 193 194 #### Deprecated Options Have Been Removed 195 196 All of Consul's previously deprecated command line flags and config options have been removed, so these will need to be mapped to their equivalents before upgrading. Here's the complete list of removed options and their equivalents: 197 198 | Removed Option | Equivalent | 199 | -------------- | ---------- | 200 | `-dc` | [`-datacenter`](/docs/agent/options.html#_datacenter) | 201 | `-retry-join-azure-tag-name` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) | 202 | `-retry-join-azure-tag-value` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) | 203 | `-retry-join-ec2-region` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) | 204 | `-retry-join-ec2-tag-key` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) | 205 | `-retry-join-ec2-tag-value` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) | 206 | `-retry-join-gce-credentials-file` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) | 207 | `-retry-join-gce-project-name` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) | 208 | `-retry-join-gce-tag-name` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) | 209 | `-retry-join-gce-zone-pattern` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) | 210 | `addresses.rpc` | None, the RPC server for CLI commands is no longer supported. | 211 | `advertise_addrs` | [`ports`](/docs/agent/options.html#ports) with [`advertise_addr`](https://www.consul.io/docs/agent/options.html#advertise_addr) and/or [`advertise_addr_wan`](/docs/agent/options.html#advertise_addr_wan) | 212 | `dogstatsd_addr` | [`telemetry.dogstatsd_addr`](/docs/agent/options.html#telemetry-dogstatsd_addr) | 213 | `dogstatsd_tags` | [`telemetry.dogstatsd_tags`](/docs/agent/options.html#telemetry-dogstatsd_tags) | 214 | `http_api_response_headers` | [`http_config.response_headers`](/docs/agent/options.html#response_headers) | 215 | `ports.rpc` | None, the RPC server for CLI commands is no longer supported. | 216 | `recursor` | [`recursors`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#recursors) | 217 | `retry_join_azure` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) | 218 | `retry_join_ec2` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) | 219 | `retry_join_gce` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) | 220 | `statsd_addr` | [`telemetry.statsd_address`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#telemetry-statsd_address) | 221 | `statsite_addr` | [`telemetry.statsite_address`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#telemetry-statsite_address) | 222 | `statsite_prefix` | [`telemetry.metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix) | 223 | `telemetry.statsite_prefix` | [`telemetry.metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix) | 224 | (service definitions) `serviceid` | [`service_id`](/docs/agent/services.html) | 225 | (service definitions) `dockercontainerid` | [`docker_container_id`](/docs/agent/services.html) | 226 | (service definitions) `tlsskipverify` | [`tls_skip_verify`](/docs/agent/services.html) | 227 | (service definitions) `deregistercriticalserviceafter` | [`deregister_critical_service_after`](/docs/agent/services.html) | 228 229 #### `statsite_prefix` Renamed to `metrics_prefix` 230 231 Since the `statsite_prefix` configuration option applied to all telemetry providers, `statsite_prefix` was renamed to [`metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix). Configuration files will need to be updated when upgrading to this version of Consul. 232 233 #### `advertise_addrs` Removed 234 235 This configuration option was removed since it was redundant with `advertise_addr` and `advertise_addr_wan` in combination with `ports` and also wrongly stated that you could configure both host and port. 236 237 #### Escaping Behavior Changed for go-discover Configs 238 239 The format for [`-retry-join`](/docs/agent/options.html#retry-join) and [`-retry-join-wan`](/docs/agent/options.html#retry-join-wan) values that use [go-discover](https://github.com/hashicorp/go-discover) cloud auto joining has changed. Values in `key=val` sequences must no longer be URL encoded and can be provided as literals as long as they do not contain spaces, backslashes `\` or double quotes `"`. If values contain these characters then use double quotes as in `"some key"="some value"`. Special characters within a double quoted string can be escaped with a backslash `\`. 240 241 #### HTTP Verbs are Enforced in Many HTTP APIs 242 243 Many endpoints in the HTTP API that previously took any HTTP verb now check for specific HTTP verbs and enforce them. This may break clients relying on the old behavior. Here's the complete list of updated endpoints and required HTTP verbs: 244 245 | Endpoint | Required HTTP Verb | 246 | -------- | ------------------ | 247 | /v1/acl/info | GET | 248 | /v1/acl/list | GET | 249 | /v1/acl/replication | GET | 250 | /v1/agent/check/deregister | PUT | 251 | /v1/agent/check/fail | PUT | 252 | /v1/agent/check/pass | PUT | 253 | /v1/agent/check/register | PUT | 254 | /v1/agent/check/warn | PUT | 255 | /v1/agent/checks | GET | 256 | /v1/agent/force-leave | PUT | 257 | /v1/agent/join | PUT | 258 | /v1/agent/members | GET | 259 | /v1/agent/metrics | GET | 260 | /v1/agent/self | GET | 261 | /v1/agent/service/register | PUT | 262 | /v1/agent/service/deregister | PUT | 263 | /v1/agent/services | GET | 264 | /v1/catalog/datacenters | GET | 265 | /v1/catalog/deregister | PUT | 266 | /v1/catalog/node | GET | 267 | /v1/catalog/nodes | GET | 268 | /v1/catalog/register | PUT | 269 | /v1/catalog/service | GET | 270 | /v1/catalog/services | GET | 271 | /v1/coordinate/datacenters | GET | 272 | /v1/coordinate/nodes | GET | 273 | /v1/health/checks | GET | 274 | /v1/health/node | GET | 275 | /v1/health/service | GET | 276 | /v1/health/state | GET | 277 | /v1/internal/ui/node | GET | 278 | /v1/internal/ui/nodes | GET | 279 | /v1/internal/ui/services | GET | 280 | /v1/session/info | GET | 281 | /v1/session/list | GET | 282 | /v1/session/node | GET | 283 | /v1/status/leader | GET | 284 | /v1/status/peers | GET | 285 | /v1/operator/area/:uuid/members | GET | 286 | /v1/operator/area/:uuid/join | PUT | 287 288 #### Unauthorized KV Requests Return 403 289 290 When ACLs are enabled, reading a key with an unauthorized token returns a 403. This previously returned a 404 response. 291 292 #### Config Section of Agent Self Endpoint has Changed 293 294 The /v1/agent/self endpoint's `Config` section has often been in flux as it was directly returning one of Consul's internal data structures. This configuration structure has been moved under `DebugConfig`, and is documents as for debugging use and subject to change, and a small set of elements of `Config` have been maintained and documented. See [Read Configuration](/api/agent.html#read-configuration) endpoint documentation for details. 295 296 #### Deprecated `configtest` Command Removed 297 298 The `configtest` command was deprecated and has been superseded by the `validate` command. 299 300 #### Undocumented Flags in `validate` Command Removed 301 302 The `validate` command supported the `-config-file` and `-config-dir` command line flags but did not document them. This support has been removed since the flags are not required. 303 304 #### Metric Names Updated 305 306 Metric names no longer start with `consul.consul`. To help with transitioning dashboards and other metric consumers, the field `enable_deprecated_names` has been added to the telemetry section of the config, which will enable metrics with the old naming scheme to be sent alongside the new ones. The following prefixes were affected: 307 308 | Prefix | 309 | ------ | 310 | consul.consul.acl | 311 | consul.consul.autopilot | 312 | consul.consul.catalog | 313 | consul.consul.fsm | 314 | consul.consul.health | 315 | consul.consul.http | 316 | consul.consul.kvs | 317 | consul.consul.leader | 318 | consul.consul.prepared-query | 319 | consul.consul.rpc | 320 | consul.consul.session | 321 | consul.consul.session_ttl | 322 | consul.consul.txn | 323 324 #### Checks Validated On Agent Startup 325 326 Consul agents now validate health check definitions in their configuration and will fail at startup if any checks are invalid. In previous versions of Consul, invalid health checks would get skipped. 327 328 ## Consul 0.9.0 329 330 #### Script Checks Are Now Opt-In 331 332 A new [`enable_script_checks`](/docs/agent/options.html#_enable_script_checks) configuration option was added, and defaults to `false`, meaning that in order to allow an agent to run health checks that execute scripts, this will need to be configured and set to `true`. This provides a safer out-of-the-box configuration for Consul where operators must opt-in to allow script-based health checks. 333 334 If your cluster uses script health checks please be sure to set this to `true` as part of upgrading agents. If this is set to `true`, you should also enable [ACLs](/docs/guides/acl.html) to provide control over which users are allowed to register health checks that could potentially execute scripts on the agent machines. 335 336 #### Web UI Is No Longer Released Separately 337 338 Consul releases will no longer include a `web_ui.zip` file with the compiled web assets. These have been built in to the Consul binary since the 0.7.x series and can be enabled with the [`-ui`](/docs/agent/options.html#_ui) configuration option. These built-in web assets have always been identical to the contents of the `web_ui.zip` file for each release. The [`-ui-dir`](/docs/agent/options.html#_ui_dir) option is still available for hosting customized versions of the web assets, but the vast majority of Consul users can just use the built in web assets. 339 340 ## Consul 0.8.0 341 342 #### Upgrade Current Cluster Leader Last 343 344 We identified a potential issue with Consul 0.8 that requires the current cluster 345 leader to be upgraded last when updating multiple servers. Please see 346 [this issue](https://github.com/hashicorp/consul/issues/2889) for more details. 347 348 #### Command-Line Interface RPC Deprecation 349 350 The RPC client interface has been removed. All CLI commands that used RPC and the 351 `-rpc-addr` flag to communicate with Consul have been converted to use the HTTP API 352 and the appropriate flags for it, and the `rpc` field has been removed from the port 353 and address binding configs. You will need to remove these fields from your config files 354 and update any scripts that passed a custom `-rpc-addr` to the following commands: 355 356 * `force-leave` 357 * `info` 358 * `join` 359 * `keyring` 360 * `leave` 361 * `members` 362 * `monitor` 363 * `reload` 364 365 #### Version 8 ACLs Are Now Opt-Out 366 367 The [`acl_enforce_version_8`](/docs/agent/options.html#acl_enforce_version_8) configuration now defaults to `true` to enable [full version 8 ACL support](/docs/guides/acl.html#version_8_acls) by default. If you are upgrading an existing cluster with ACLs enabled, you will need to set this to `false` during the upgrade on **both Consul agents and Consul servers**. Version 8 ACLs were also changed so that [`acl_datacenter`](/docs/agent/options.html#acl_datacenter) must be set on agents in order to enable the agent-side enforcement of ACLs. This makes for a smoother experience in clusters where ACLs aren't enabled at all, but where the agents would have to wait to contact a Consul server before learning that. 368 369 #### Remote Exec Is Now Opt-In 370 371 The default for [`disable_remote_exec`](/docs/agent/options.html#disable_remote_exec) was 372 changed to "true", so now operators need to opt-in to having agents support running 373 commands remotely via [`consul exec`](/docs/commands/exec.html). 374 375 #### Raft Protocol Version Compatibility 376 377 When upgrading to Consul 0.8.0 from a version lower than 0.7.0, users will need to 378 set the [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) option to 1 in 379 order to maintain backwards compatibility with the old servers during the upgrade. 380 After the servers have been migrated to version 0.8.0, `-raft-protocol` can be moved 381 up to 2 and the servers restarted to match the default. 382 383 The Raft protocol must be stepped up in this way; only adjacent version numbers are 384 compatible (for example, version 1 cannot talk to version 3). Here is a table of the 385 Raft Protocol versions supported by each Consul version: 386 387 <table class="table table-bordered table-striped"> 388 <tr> 389 <th>Version</th> 390 <th>Supported Raft Protocols</th> 391 </tr> 392 <tr> 393 <td>0.6 and earlier</td> 394 <td>0</td> 395 </tr> 396 <tr> 397 <td>0.7</td> 398 <td>1</td> 399 </tr> 400 <tr> 401 <td>0.8</td> 402 <td>1, 2, 3</td> 403 </tr> 404 </table> 405 406 In order to enable all [Autopilot](/docs/guides/autopilot.html) features, all servers 407 in a Consul cluster must be running with Raft protocol version 3 or later. 408 409 ## Consul 0.7.1 410 411 #### Child Process Reaping 412 413 Child process reaping support has been removed, along with the `reap` configuration option. Reaping is also done via [dumb-init](https://github.com/Yelp/dumb-init) in the [Consul Docker image](https://github.com/hashicorp/docker-consul), so removing it from Consul itself simplifies the code and eases future maintenance for Consul. If you are running Consul as PID 1 in a container you will need to arrange for a wrapper process to reap child processes. 414 415 #### DNS Resiliency Defaults 416 417 The default for [`max_stale`](/docs/agent/options.html#max_stale) has been increased from 5 seconds to a near-indefinite threshold (10 years) to allow DNS queries to continue to be served in the event of a long outage with no leader. A new telemetry counter was added at `consul.dns.stale_queries` to track when agents serve DNS queries that are stale by more than 5 seconds. 418 419 ## Consul 0.7 420 421 Consul version 0.7 is a very large release with many important changes. Changes 422 to be aware of during an upgrade are categorized below. 423 424 #### Performance Timing Defaults and Tuning 425 426 Consul 0.7 now defaults the DNS configuration to allow for stale queries by defaulting 427 [`allow_stale`](/docs/agent/options.html#allow_stale) to true for better utilization 428 of available servers. If you want to retain the previous behavior, set the following 429 configuration: 430 431 ```javascript 432 { 433 "dns_config": { 434 "allow_stale": false 435 } 436 } 437 ``` 438 439 Consul also 0.7 introduced support for tuning Raft performance using a new 440 [performance configuration block](/docs/agent/options.html#performance). Also, 441 the default Raft timing is set to a lower-performance mode suitable for 442 [minimal Consul servers](/docs/guides/performance.html#minimum). 443 444 To continue to use the high-performance settings that were the default prior to 445 Consul 0.7 (recommended for production servers), add the following configuration 446 to all Consul servers when upgrading: 447 448 ```javascript 449 { 450 "performance": { 451 "raft_multiplier": 1 452 } 453 } 454 ``` 455 456 See the [Server Performance](/docs/guides/performance.html) guide for more details. 457 458 #### Leave-Related Configuration Defaults 459 460 The default behavior of [`leave_on_terminate`](/docs/agent/options.html#leave_on_terminate) 461 and [`skip_leave_on_interrupt`](/docs/agent/options.html#skip_leave_on_interrupt) 462 are now dependent on whether or not the agent is acting as a server or client: 463 464 * For servers, `leave_on_terminate` defaults to "false" and `skip_leave_on_interrupt` 465 defaults to "true". 466 467 * For clients, `leave_on_terminate` defaults to "true" and `skip_leave_on_interrupt` 468 defaults to "false". 469 470 These defaults are designed to be safer for servers so that you must explicitly 471 configure them to leave the cluster. This also results in a better experience for 472 clients, especially in cloud environments where they may be created and destroyed 473 often and users prefer not to wait for the 72 hour reap time for cleanup. 474 475 #### Dropped Support for Protocol Version 1 476 477 Consul version 0.7 dropped support for protocol version 1, which means it 478 is no longer compatible with versions of Consul prior to 0.3. You will need 479 to upgrade all agents to a newer version of Consul before upgrading to Consul 480 0.7. 481 482 #### Prepared Query Changes 483 484 Consul version 0.7 adds a feature which allows prepared queries to store a 485 [`Near` parameter](/api/query.html#near) in the query definition 486 itself. This feature enables using the distance sorting features of prepared 487 queries without explicitly providing the node to sort near in requests, but 488 requires the agent servicing a request to send additional information about 489 itself to the Consul servers when executing the prepared query. Agents prior 490 to 0.7 do not send this information, which means they are unable to properly 491 execute prepared queries configured with a `Near` parameter. Similarly, any 492 server nodes prior to version 0.7 are unable to store the `Near` parameter, 493 making them unable to properly serve requests for prepared queries using the 494 feature. It is recommended that all agents be running version 0.7 prior to 495 using this feature. 496 497 #### WAN Address Translation in HTTP Endpoints 498 499 Consul version 0.7 added support for translating WAN addresses in certain 500 [HTTP endpoints](/docs/agent/options.html#translate_wan_addrs). The servers 501 and the agents need to be running version 0.7 or later in order to use this 502 feature. 503 504 These translated addresses could break HTTP endpoint consumers that are 505 expecting local addresses, so a new [`X-Consul-Translate-Addresses`](/api/index.html#translate_header) 506 header was added to allow clients to detect if translation is enabled for HTTP 507 responses. A "lan" tag was added to `TaggedAddresses` for clients that need 508 the local address regardless of translation. 509 510 #### Outage Recovery and `peers.json` Changes 511 512 The `peers.json` file is no longer present by default and is only used when 513 performing recovery. This file will be deleted after Consul starts and ingests 514 the file. Consul 0.7 also uses a new, automatically-created raft/peers.info file 515 to avoid ingesting the `peers.json` file on the first start after upgrading (the 516 `peers.json` file is simply deleted on the first start after upgrading). 517 518 Please be sure to review the [Outage Recovery Guide](/docs/guides/outage.html) 519 before upgrading for more details. 520 521 ## Consul 0.6.4 522 523 Consul 0.6.4 made some substantial changes to how ACLs work with prepared 524 queries. Existing queries will execute with no changes, but there are important 525 differences to understand about how prepared queries are managed before you 526 upgrade. In particular, prepared queries with no `Name` defined will no longer 527 require any ACL to manage them, and prepared queries with a `Name` defined are 528 now governed by a new `query` ACL policy that will need to be configured 529 after the upgrade. 530 531 See the [ACL Guide](/docs/guides/acl.html#prepared_query_acls) for more details 532 about the new behavior and how it compares to previous versions of Consul. 533 534 ## Consul 0.6 535 536 Consul version 0.6 is a very large release with many enhancements and 537 optimizations. Changes to be aware of during an upgrade are categorized below. 538 539 #### Data Store Changes 540 541 Consul changed the format used to store data on the server nodes in version 0.5 542 (see 0.5.1 notes below for details). Previously, Consul would automatically 543 detect data directories using the old LMDB format, and convert them to the newer 544 BoltDB format. This automatic upgrade has been removed for Consul 0.6, and 545 instead a safeguard has been put in place which will prevent Consul from booting 546 if the old directory format is detected. 547 548 It is still possible to migrate from a 0.5.x version of Consul to 0.6+ using the 549 [consul-migrate](https://github.com/hashicorp/consul-migrate) CLI utility. This 550 is the same tool that was previously embedded into Consul. See the 551 [releases](https://github.com/hashicorp/consul-migrate/releases) page for 552 downloadable versions of the tool. 553 554 Also, in this release Consul switched from LMDB to a fully in-memory database for 555 the state store. Because LMDB is a disk-based backing store, it was able to store 556 more data than could fit in RAM in some cases (though this is not a recommended 557 configuration for Consul). If you have an extremely large data set that won't fit 558 into RAM, you may encounter issues upgrading to Consul 0.6.0 and later. Consul 559 should be provisioned with physical memory approximately 2X the data set size to 560 allow for bursty allocations and subsequent garbage collection. 561 562 #### ACL Enhancements 563 564 Consul 0.6 introduces enhancements to the ACL system which may require special 565 handling: 566 567 * Service ACLs are enforced during service discovery (REST + DNS) 568 569 Previously, service discovery was wide open, and any client could query 570 information about any service without providing a token. Consul now requires 571 read-level access at a minimum when ACLs are enabled to return service 572 information over the REST or DNS interfaces. If clients depend on an open 573 service discovery system, then the following should be added to all ACL tokens 574 which require it: 575 576 # Enable discovery of all services 577 service "" { 578 policy = "read" 579 } 580 581 When the DNS interface is queried, the agent's 582 [`acl_token`](/docs/agent/options.html#acl_token) is used, so be sure 583 that token has sufficient privileges to return the DNS records you 584 expect to retrieve from it. 585 586 * Event and keyring ACLs 587 588 Similar to service discovery, the new event and keyring ACLs will block access 589 to these operations if the `acl_default_policy` is set to `deny`. If clients depend 590 on open access to these, then the following should be added to all ACL tokens which 591 require them: 592 593 event "" { 594 policy = "write" 595 } 596 597 keyring = "write" 598 599 Unfortunately, these are new ACLs for Consul 0.6, so they must be added after the 600 upgrade is complete. 601 602 #### Prepared Queries 603 604 Prepared queries introduce a new Raft log entry type that isn't supported on older 605 versions of Consul. It's important to not use the prepared query features of Consul 606 until all servers in a cluster have been upgraded to version 0.6.0. 607 608 #### Single Private IP Enforcement 609 610 Consul will refuse to start if there are multiple private IPs available, so 611 if this is the case you will need to configure Consul's advertise or bind addresses 612 before upgrading. 613 614 #### New Web UI File Layout 615 616 The release .zip file for Consul's web UI no longer contains a `dist` sub-folder; 617 everything has been moved up one level. If you have any automated scripts that 618 expect the old layout you may need to update them. 619 620 ## Consul 0.5.1 621 622 Consul version 0.5.1 uses a different backend store for persisting the Raft 623 log. Because of this change, a data migration is necessary to move the log 624 entries out of LMDB and into the newer backend, BoltDB. 625 626 Consul version 0.5.1+ makes this transition seamless and easy. As a user, there 627 are no special steps you need to take. When Consul starts, it checks 628 for presence of the legacy LMDB data files, and migrates them automatically 629 if any are found. You will see a log emitted when Raft data is migrated, like 630 this: 631 632 ``` 633 ==> Successfully migrated raft data in 5.839642ms 634 ``` 635 636 This automatic upgrade will only exist in Consul 0.5.1+ and it will 637 be removed starting with Consul 0.6.0+. It will still be possible to upgrade directly 638 from pre-0.5.1 versions by using the consul-migrate utility, which is available on the 639 [Consul Tools page](/downloads_tools.html). 640 641 ## Consul 0.5 642 643 Consul version 0.5 adds two features that complicate the upgrade process: 644 645 * ACL system includes service discovery and registration 646 * Internal use of tombstones to fix behavior of blocking queries 647 in certain edge cases. 648 649 Users of the ACL system need to be aware that deploying Consul 0.5 will 650 cause service registration to be enforced. This means if an agent 651 attempts to register a service without proper privileges it will be denied. 652 If the `acl_default_policy` is "allow" then clients will continue to 653 work without an updated policy. If the policy is "deny", then all clients 654 will begin to have their registration rejected causing issues. 655 656 To avoid this situation, all the ACL policies should be updated to 657 add something like this: 658 659 # Enable all services to be registered 660 service "" { 661 policy = "write" 662 } 663 664 This will set the service policy to `write` level for all services. 665 The blank service name is the catch-all value. A more specific service 666 can also be specified: 667 668 # Enable only the API service to be registered 669 service "api" { 670 policy = "write" 671 } 672 673 The ACL policy can be updated while running 0.4, and enforcement will 674 being with the upgrade to 0.5. The policy updates will ensure the 675 availability of the cluster. 676 677 The second major change is the new internal command used for tombstones. 678 The details of the change are not important, however to function the leader 679 node will replicate a new command to its followers. Consul is designed 680 defensively, and when a command that is not recognized is received, the 681 server will panic. This is a purposeful design decision to avoid the possibility 682 of data loss, inconsistencies, or security issues caused by future incompatibility. 683 684 In practice, this means if a Consul 0.5 node is the leader, all of its 685 followers must also be running 0.5. There are a number of ways to do this 686 to ensure cluster availability: 687 688 * Add new 0.5 nodes, then remove the old servers. This will add the new 689 nodes as followers, and once the old servers are removed, one of the 690 0.5 nodes will become leader. 691 692 * Upgrade the followers first, then the leader last. Using `consul info`, 693 you can determine which nodes are followers. Do an in-place upgrade 694 on them first, and finally upgrade the leader last. 695 696 * Upgrade them in any order, but ensure all are done within 15 minutes. 697 Even if the leader is upgraded to 0.5 first, as long as all of the followers 698 are running 0.5 within 15 minutes there will be no issues. 699 700 Finally, even if any of the methods above are not possible or the process 701 fails for some reason, it is not fatal. The older version of the server 702 will simply panic and stop. At that point, you can upgrade to the new version 703 and restart the agent. There will be no data loss and the cluster will 704 resume operations.