github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/guides/ingesters-rolling-updates.md

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/guides/ingesters-rolling-updates.md (about)

1 ---
2 title: "Ingesters rolling updates"
3 linkTitle: "Ingesters rolling updates"
4 weight: 10
5 slug: ingesters-rolling-updates
6 ---
7
8 Cortex [ingesters](architecture.md#ingester) are semi-stateful.
9 A running ingester holds several hours of time series data in memory, before they're flushed to the long-term storage.
10 When an ingester shutdowns, because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss.
11
12 In this document we describe the techniques employed to safely handle rolling updates, based on different setups:
13
14 - [Blocks storage](#blocks-storage)
15 - [Chunks storage with WAL enabled](#chunks-storage-with-wal-enabled)
16 - [Chunks storage with WAL disabled](#chunks-storage-with-wal-disabled-hand-over)
17
18 _If you're looking how to scale up / down ingesters, please refer to the [dedicated guide](./ingesters-scaling-up-and-down.md)._
19
20 ## Blocks storage
21
22 The Cortex [blocks storage](../blocks-storage/_index.md) requires ingesters to run with a persistent disk where the TSDB WAL and blocks are stored (eg. a StatefulSet when deployed on Kubernetes).
23
24 During a rolling update, the leaving ingester closes the open TSDBs, synchronize the data to disk (`fsync`) and releases the disk resources.
25 The new ingester, which is expected to reuse the same disk of the leaving one, will replay the TSDB WAL on startup in order to load back in memory the time series that have not been compacted into a block yet.
26
27 _The blocks storage doesn't support the series [hand-over](#chunks-storage-with-wal-disabled-hand-over)._
28
29 ## Chunks storage (deprecated)
30
31 The Cortex chunks storage optionally supports a write-ahead log (WAL).
32 The rolling update procedure for a Cortex cluster running the chunks storage depends whether the WAL is enabled or not.
33
34 ### Chunks storage with WAL enabled
35
36 Similarly to the blocks storage, when Cortex is running the [chunks storage](../chunks-storage/_index.md) with WAL enabled, it requires ingesters to run with a persistent disk where the WAL is stored (eg. a StatefulSet when deployed on Kubernetes).
37
38 During a rolling update, the leaving ingester closes the WAL, synchronize the data to disk (`fsync`) and releases the disk resources.
39 The new ingester, which is expected to reuse the same disk of the leaving one, will replay the WAL on startup in order to load back in memory the time series data.
40
41 _For more information about the WAL, please refer to [Ingesters with WAL](../chunks-storage/ingesters-with-wal.md)._
42
43 ### Chunks storage with WAL disabled (hand-over)
44
45 When Cortex is running the [chunks storage](../chunks-storage/_index.md) with WAL disabled, Cortex supports on-the-fly series hand-over between a leaving ingester and a joining one.
46
47 The hand-over is based on the ingesters state stored in the ring. Each ingester could be in one of the following **states**:
48
49 - `PENDING`
50 - `JOINING`
51 - `ACTIVE`
52 - `LEAVING`
53
54 On startup, an ingester goes into the **`PENDING`** state.
55 In this state, the ingester is waiting for a hand-over from another ingester that is `LEAVING`.
56 If no hand-over occurs within the configured timeout period ("auto-join timeout", configurable via `-ingester.join-after` option), the ingester will join the ring with a new set of random tokens (eg. during a scale up) and will switch its state to `ACTIVE`.
57
58 When a running ingester in the **`ACTIVE`** state is notified to shutdown via `SIGINT` or `SIGTERM` Unix signal, the ingester switches to `LEAVING` state. In this state it cannot receive write requests anymore, but it can still receive read requests for series it has in memory.
59
60 A **`LEAVING`** ingester looks for a `PENDING` ingester to start a hand-over process with.
61 If it finds one, that ingester goes into the `JOINING` state and the leaver transfers all its in-memory data over to the joiner.
62 On successful transfer the leaver removes itself from the ring and exits, while the joiner changes its state to `ACTIVE`, taking over ownership of the leaver's [ring tokens](../architecture.md#hashing). As soon as the joiner switches it state to `ACTIVE`, it will start receive both write requests from distributors and queries from queriers.
63
64 If the `LEAVING` ingester does not find a `PENDING` ingester after `-ingester.max-transfer-retries` retries, it will flush all of its chunks to the long-term storage, then removes itself from the ring and exits. The chunks flushing to the storage may take several minutes to complete.
65
66 #### Higher number of series / chunks during rolling updates
67
68 During hand-over, neither the leaving nor joining ingesters will
69 accept new samples. Distributors are aware of this, and "spill" the
70 samples to the next ingester in the ring. This creates a set of extra
71 "spilled" series and chunks which will idle out and flush after hand-over is
72 complete.
73
74 #### Observability
75
76 The following metrics can be used to observe this process:
77
78 - **`cortex_member_ring_tokens_owned`** 
79 How many tokens each ingester thinks it owns.
80 - **`cortex_ring_tokens_owned`** 
81 How many tokens each ingester is seen to own by other components.
82 - **`cortex_ring_member_ownership_percent`** 
83 Same as `cortex_ring_tokens_owned` but expressed as a percentage.
84 - **`cortex_ring_members`** 
85 How many ingesters can be seen in each state, by other components.
86 - **`cortex_ingester_sent_chunks`** 
87 Number of chunks sent by leaving ingester.
88 - **`cortex_ingester_received_chunks`** 
89 Number of chunks received by joining ingester.
90
91 You can see the current state of the ring via http browser request to
92 `/ring` on a distributor.