go.etcd.io/etcd@v3.3.27+incompatible/Documentation/upgrades/upgrade_3_2.md (about) 1 --- 2 title: Upgrade etcd from 3.1 to 3.2 3 --- 4 5 In the general case, upgrading from etcd 3.1 to 3.2 can be a zero-downtime, rolling upgrade: 6 - one by one, stop the etcd v3.1 processes and replace them with etcd v3.2 processes 7 - after running all v3.2 processes, new features in v3.2 are available to the cluster 8 9 Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare. 10 11 ### Upgrade checklists 12 13 **NOTE:** When [migrating from v2 with no v3 data](https://github.com/etcd-io/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data. 14 15 Highlighted breaking changes in 3.2. 16 17 #### Changed default `snapshot-count` value 18 19 Higher `--snapshot-count` holds more Raft entries in memory until snapshot, thus causing [recurrent higher memory usage](https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-371977156). Since leader retains latest Raft entries for longer, a slow follower has more time to catch up before leader snapshot. `--snapshot-count` is a tradeoff between higher memory usage and better availabilities of slow followers. 20 21 Since v3.2, the default value of `--snapshot-count` has [changed from from 10,000 to 100,000](https://github.com/etcd-io/etcd/pull/7160). 22 23 #### Changed gRPC dependency (>=3.2.10) 24 25 3.2.10 or later now requires [grpc/grpc-go](https://github.com/grpc/grpc-go/releases) `v1.7.5` (<=3.2.9 requires `v1.2.1`). 26 27 ##### Deprecated `grpclog.Logger` 28 29 `grpclog.Logger` has been deprecated in favor of [`grpclog.LoggerV2`](https://github.com/grpc/grpc-go/blob/master/grpclog/loggerv2.go). `clientv3.Logger` is now `grpclog.LoggerV2`. 30 31 Before 32 33 ```go 34 import "github.com/coreos/etcd/clientv3" 35 clientv3.SetLogger(log.New(os.Stderr, "grpc: ", 0)) 36 ``` 37 38 After 39 40 ```go 41 import "github.com/coreos/etcd/clientv3" 42 import "google.golang.org/grpc/grpclog" 43 clientv3.SetLogger(grpclog.NewLoggerV2(os.Stderr, os.Stderr, os.Stderr)) 44 45 // log.New above cannot be used (not implement grpclog.LoggerV2 interface) 46 ``` 47 48 ##### Deprecated `grpc.ErrClientConnTimeout` 49 50 Previously, `grpc.ErrClientConnTimeout` error is returned on client dial time-outs. 3.2 instead returns `context.DeadlineExceeded` (see [#8504](https://github.com/etcd-io/etcd/issues/8504)). 51 52 Before 53 54 ```go 55 // expect dial time-out on ipv4 blackhole 56 _, err := clientv3.New(clientv3.Config{ 57 Endpoints: []string{"http://254.0.0.1:12345"}, 58 DialTimeout: 2 * time.Second 59 }) 60 if err == grpc.ErrClientConnTimeout { 61 // handle errors 62 } 63 ``` 64 65 After 66 67 ```go 68 _, err := clientv3.New(clientv3.Config{ 69 Endpoints: []string{"http://254.0.0.1:12345"}, 70 DialTimeout: 2 * time.Second 71 }) 72 if err == context.DeadlineExceeded { 73 // handle errors 74 } 75 ``` 76 77 #### Changed maximum request size limits (>=3.2.10) 78 79 3.2.10 and 3.2.11 allow custom request size limits in server side. >=3.2.12 allows custom request size limits for both server and **client side**. In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB. 80 81 Server-side request limits can be configured with `--max-request-bytes` flag: 82 83 ```bash 84 # limits request size to 1.5 KiB 85 etcd --max-request-bytes 1536 86 87 # client writes exceeding 1.5 KiB will be rejected 88 etcdctl put foo [LARGE VALUE...] 89 # etcdserver: request is too large 90 ``` 91 92 Or configure `embed.Config.MaxRequestBytes` field: 93 94 ```go 95 import "github.com/coreos/etcd/embed" 96 import "github.com/coreos/etcd/etcdserver/api/v3rpc/rpctypes" 97 98 // limit requests to 5 MiB 99 cfg := embed.NewConfig() 100 cfg.MaxRequestBytes = 5 * 1024 * 1024 101 102 // client writes exceeding 5 MiB will be rejected 103 _, err := cli.Put(ctx, "foo", [LARGE VALUE...]) 104 err == rpctypes.ErrRequestTooLarge 105 ``` 106 107 **If not specified, server-side limit defaults to 1.5 MiB**. 108 109 Client-side request limits must be configured based on server-side limits. 110 111 ```bash 112 # limits request size to 1 MiB 113 etcd --max-request-bytes 1048576 114 ``` 115 116 ```go 117 import "github.com/coreos/etcd/clientv3" 118 119 cli, _ := clientv3.New(clientv3.Config{ 120 Endpoints: []string{"127.0.0.1:2379"}, 121 MaxCallSendMsgSize: 2 * 1024 * 1024, 122 MaxCallRecvMsgSize: 3 * 1024 * 1024, 123 }) 124 125 126 // client writes exceeding "--max-request-bytes" will be rejected from etcd server 127 _, err := cli.Put(ctx, "foo", strings.Repeat("a", 1*1024*1024+5)) 128 err == rpctypes.ErrRequestTooLarge 129 130 131 // client writes exceeding "MaxCallSendMsgSize" will be rejected from client-side 132 _, err = cli.Put(ctx, "foo", strings.Repeat("a", 5*1024*1024)) 133 err.Error() == "rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (5242890 vs. 2097152)" 134 135 136 // some writes under limits 137 for i := range []int{0,1,2,3,4} { 138 _, err = cli.Put(ctx, fmt.Sprintf("foo%d", i), strings.Repeat("a", 1*1024*1024-500)) 139 if err != nil { 140 panic(err) 141 } 142 } 143 // client reads exceeding "MaxCallRecvMsgSize" will be rejected from client-side 144 _, err = cli.Get(ctx, "foo", clientv3.WithPrefix()) 145 err.Error() == "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5240509 vs. 3145728)" 146 ``` 147 148 **If not specified, client-side send limit defaults to 2 MiB (1.5 MiB + gRPC overhead bytes) and receive limit to `math.MaxInt32`**. Please see [clientv3 godoc](https://godoc.org/github.com/coreos/etcd/clientv3#Config) for more detail. 149 150 #### Changed raw gRPC client wrappers 151 152 3.2.12 or later changes the function signatures of `clientv3` gRPC client wrapper. This change was needed to support [custom `grpc.CallOption` on message size limits](https://github.com/etcd-io/etcd/pull/9047). 153 154 Before and after 155 156 ```diff 157 -func NewKVFromKVClient(remote pb.KVClient) KV { 158 +func NewKVFromKVClient(remote pb.KVClient, c *Client) KV { 159 160 -func NewClusterFromClusterClient(remote pb.ClusterClient) Cluster { 161 +func NewClusterFromClusterClient(remote pb.ClusterClient, c *Client) Cluster { 162 163 -func NewLeaseFromLeaseClient(remote pb.LeaseClient, keepAliveTimeout time.Duration) Lease { 164 +func NewLeaseFromLeaseClient(remote pb.LeaseClient, c *Client, keepAliveTimeout time.Duration) Lease { 165 166 -func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient) Maintenance { 167 +func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient, c *Client) Maintenance { 168 169 -func NewWatchFromWatchClient(wc pb.WatchClient) Watcher { 170 +func NewWatchFromWatchClient(wc pb.WatchClient, c *Client) Watcher { 171 ``` 172 173 #### Changed `clientv3.Lease.TimeToLive` API 174 175 Previously, `clientv3.Lease.TimeToLive` API returned `lease.ErrLeaseNotFound` on non-existent lease ID. 3.2 instead returns TTL=-1 in its response and no error (see [#7305](https://github.com/etcd-io/etcd/pull/7305)). 176 177 Before 178 179 ```go 180 // when leaseID does not exist 181 resp, err := TimeToLive(ctx, leaseID) 182 resp == nil 183 err == lease.ErrLeaseNotFound 184 ``` 185 186 After 187 188 ```go 189 // when leaseID does not exist 190 resp, err := TimeToLive(ctx, leaseID) 191 resp.TTL == -1 192 err == nil 193 ``` 194 195 #### Moved `clientv3.NewFromConfigFile` to `clientv3.yaml.NewConfig` 196 197 `clientv3.NewFromConfigFile` is moved to `yaml.NewConfig`. 198 199 Before 200 201 ```go 202 import "github.com/coreos/etcd/clientv3" 203 clientv3.NewFromConfigFile 204 ``` 205 206 After 207 208 ```go 209 import clientv3yaml "github.com/coreos/etcd/clientv3/yaml" 210 clientv3yaml.NewConfig 211 ``` 212 213 #### Change in `--listen-peer-urls` and `--listen-client-urls` 214 215 3.2 now rejects domains names for `--listen-peer-urls` and `--listen-client-urls` (3.1 only prints out warnings), since domain name is invalid for network interface binding. Make sure that those URLs are properly formated as `scheme://IP:port`. 216 217 See [issue #6336](https://github.com/etcd-io/etcd/issues/6336) for more contexts. 218 219 ### Server upgrade checklists 220 221 #### Upgrade requirements 222 223 To upgrade an existing etcd deployment to 3.2, the running cluster must be 3.1 or greater. If it's before 3.1, please [upgrade to 3.1](upgrade_3_1.md) before upgrading to 3.2. 224 225 Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding. 226 227 #### Preparation 228 229 Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment. 230 231 Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore). 232 233 #### Mixed versions 234 235 While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.2. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features. 236 237 #### Limitations 238 239 Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation. 240 241 If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member. 242 243 For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure. 244 245 #### Downgrade 246 247 If all members have been upgraded to v3.2, the cluster will be upgraded to v3.2, and downgrade from this completed state is **not possible**. If any single member is still v3.1, however, the cluster and its operations remains "v3.1", and it is possible from this mixed cluster state to return to using a v3.1 etcd binary on all members. 248 249 Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded. 250 251 ### Upgrade procedure 252 253 This example shows how to upgrade a 3-member v3.1 ectd cluster running on a local machine. 254 255 #### 1. Check upgrade requirements 256 257 Is the cluster healthy and running v3.1.x? 258 259 ``` 260 $ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 261 localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms 262 localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms 263 localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms 264 265 $ curl http://localhost:2379/version 266 {"etcdserver":"3.1.7","etcdcluster":"3.1.0"} 267 ``` 268 269 #### 2. Stop the existing etcd process 270 271 When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken: 272 273 ``` 274 2017-04-27 14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership. 275 2017-04-27 14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4 276 2017-04-27 14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4 277 2017-04-27 14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4 278 2017-04-27 14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4 279 2017-04-27 14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4 280 2017-04-27 14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4 281 2017-04-27 14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections 282 2017-04-27 14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4 283 2017-04-27 14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4 284 2017-04-27 14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader) 285 2017-04-27 14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader) 286 2017-04-27 14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused) 287 2017-04-27 14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive 288 2017-04-27 14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer) 289 ``` 290 291 It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur: 292 293 ``` 294 $ etcdctl snapshot save backup.db 295 ``` 296 297 #### 3. Drop-in etcd v3.2 binary and start the new etcd process 298 299 The new v3.2 etcd will publish its information to the cluster: 300 301 ``` 302 2017-04-27 14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1 303 ``` 304 305 Verify that each member, and then the entire cluster, becomes healthy with the new v3.2 etcd binary: 306 307 ``` 308 $ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 309 localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms 310 localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms 311 localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms 312 ``` 313 314 Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.2: 315 316 ``` 317 2017-04-27 14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0 318 2017-04-27 14:15:21.073110 W | etcdserver: the local etcd version 3.1.7 is not up-to-date 319 2017-04-27 14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.2.0 320 2017-04-27 14:15:21.073157 W | etcdserver: the local etcd version 3.1.7 is not up-to-date 321 2017-04-27 14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0 322 ``` 323 324 #### 4. Repeat step 2 to step 3 for all other members 325 326 #### 5. Finish 327 328 When all members are upgraded, the cluster will report upgrading to 3.2 successfully: 329 330 ``` 331 2017-04-27 14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.1 to 3.2 332 2017-04-27 14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.2 333 ``` 334 335 ``` 336 $ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 337 localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms 338 localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms 339 localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms 340 ``` 341 342 [etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev