github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/guides/ingesters-scaling-up-and-down.md

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/guides/ingesters-scaling-up-and-down.md (about)

     1  ---
     2  title: "Ingesters scaling up and down"
     3  linkTitle: "Ingesters scaling up and down"
     4  weight: 10
     5  slug: ingesters-scaling-up-and-down
     6  ---
     7  
     8  This guide explains how to scale up and down ingesters.
     9  
    10  _If you're looking how to run ingesters rolling updates, please refer to the [dedicated guide](./ingesters-rolling-updates.md)._
    11  
    12  ## Scaling up
    13  
    14  Adding more ingesters to a Cortex cluster is considered a safe operation. When a new ingester starts, it will register to the [hash ring](../architecture.md#the-hash-ring) and the distributors will reshard received series accordingly.
    15  Ingesters that were previously receiving those series will see data stop arriving and will consider those series "idle".
    16  
    17  If you run with `-distributor.shard-by-all-labels=false` (the default), before adding a second ingester you have to wait until data has migrated from idle series to the back-end store, otherwise you will see gaps in queries.
    18  For chunks storage, this will start after `-ingester.max-chunk-idle` time (default 5 minutes), and will finish when the flush queue is clear - how long depends on how fast your back-end store can accept writes.
    19  For blocks storage, this will happen after the next "head compaction" (typically every 2 hours).
    20  If you have set `-querier.query-store-after` then that is also a minimum time you have to wait before adding a second ingester.
    21  
    22  If you run with `-distributor.shard-by-all-labels=true`,
    23  no special care is required to take when scaling up ingesters.
    24  
    25  ## Scaling down
    26  
    27  A running ingester holds several hours of time series data in memory, before they're flushed to the long-term storage.  When an ingester shuts down, because of a scale down operation, the in-memory data must not be discarded in order to avoid any data loss.
    28  
    29  The procedure to adopt when scaling down ingesters depends on your Cortex setup:
    30  
    31  - [Blocks storage](#blocks-storage)
    32  - [Chunks storage with WAL enabled](#chunks-storage-with-wal-enabled)
    33  - [Chunks storage with WAL disabled](#chunks-storage-with-wal-disabled-hand-over)
    34  
    35  ### Blocks storage
    36  
    37  When Cortex is running the [blocks storage](../blocks-storage/_index.md), ingesters don't flush series to blocks at shutdown by default. However, Cortex ingesters expose an API endpoint [`/shutdown`](../api/_index.md#shutdown) that can be called to flush series to blocks and upload blocks to the long-term storage before the ingester terminates.
    38  
    39  Even if ingester blocks are compacted and shipped to the storage at shutdown, it takes some time for queriers and store-gateways to discover the newly uploaded blocks. This is due to the fact that the blocks storage runs a periodic scanning of the storage bucket to discover blocks. If two or more ingesters are scaled down in a short period of time, queriers may miss some data at query time due to series that were stored in the terminated ingesters but their blocks haven't been discovered yet.
    40  
    41  The ingesters scale down is deemed an infrequent operation and no automation is currently provided. However, if you need to scale down ingesters, please be aware of the following:
    42  
    43  - Configure queriers and rulers to always query the storage
    44    - `-querier.query-store-after=0s`
    45  - Frequently scan the storage bucket
    46    - `-blocks-storage.bucket-store.sync-interval=5m`
    47    - `-compactor.cleanup-interval=5m`
    48  - Lower bucket scanning cache TTLs
    49    - `-blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl=1m`
    50    - `-blocks-storage.bucket-store.metadata-cache.tenant-blocks-list-ttl=1m`
    51    - `-blocks-storage.bucket-store.metadata-cache.metafile-doesnt-exist-ttl=1m`
    52  - Ingesters should be scaled down one by one:
    53    1. Call `/shutdown` endpoint on the ingester to shutdown
    54    2. Wait until the HTTP call returns successfully or "finished flushing and shipping TSDB blocks" is logged
    55    3. Terminate the ingester process (the `/shutdown` will not do it)
    56    4. Before proceeding to the next ingester, wait 2x the maximum between `-blocks-storage.bucket-store.sync-interval` and `-compactor.cleanup-interval`
    57  
    58  ### Chunks storage with WAL enabled
    59  
    60  When Cortex is running the [chunks storage](../chunks-storage/_index.md) with WAL enabled, ingesters don't flush series chunks to storage at shutdown by default. However, Cortex ingesters expose an API endpoint [`/shutdown`](../api/_index.md#shutdown) that can be called to flush chunks to the long-term storage before the ingester terminates.
    61  
    62  The procedure to scale down ingesters -- one by one -- should be:
    63  
    64  1. Call `/shutdown` endpoint on the ingester to shutdown
    65  2.  Wait until the HTTP call returns successfully or "flushing of chunks complete" is logged
    66  3. Terminate the ingester process (the `/shutdown` will not do it)
    67  
    68  _For more information about the chunks storage WAL, please refer to [Ingesters with WAL](../chunks-storage/ingesters-with-wal.md)._
    69  
    70  ### Chunks storage with WAL disabled
    71  
    72  When Cortex is running the chunks storage with WAL disabled, ingesters flush series chunks to the storage at shutdown if no `PENDING` ingester (to transfer series to) is found. Because of this, it's safe to scale down ingesters with no special care in this setup.