github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/proposals/tenant-deletion.md (about)

     1  ---
     2  title: "Deletion of Tenant Data from Blocks Storage"
     3  linkTitle: "Deletion of Tenant Data from Blocks Storage"
     4  weight: 1
     5  slug: tenant-deletion
     6  ---
     7  
     8  - Author: [Peter Stibrany](https://github.com/pstibrany)
     9  - Date: November 2020
    10  - Status: Accepted
    11  
    12  ## Deletion of tenant data
    13  
    14  ## Problem
    15  
    16  When a tenant is deleted from the external system that controls access to Cortex, we want to clean up tenants data from Cortex as well.
    17  
    18  ## Background
    19  
    20  When using blocks storage, Cortex stores tenant’s data in several places:
    21  - object store for long-term storage of blocks,
    22  - ingesters disks for short-term storage. Ingesters eventually upload data to long-term storage,
    23  - various caches: query-frontend, chunks, index and metadata,
    24  - object store for rules (separate from blocks),
    25  - object store for alert manager configuration,
    26  - state for alertmanager instances (notifications and silences).
    27  
    28  This document expects that there is an external authorization system in place.
    29  Disabling or deleting the tenant in this system will stop tenants data, queries and other API calls from reaching Cortex.
    30  Note that there may be a delay before disabling or deleting the user, until data / queries fully stop, due to eg. caching in authorization proxies.
    31  Cortex endpoint for deleting a tenant data should be only called after this blocking is in place.
    32  
    33  ## Proposal
    34  
    35  ### API Endpoints
    36  
    37  #### POST /purger/delete_tenant
    38  
    39  We propose to introduce an `/purger/delete_tenant` API endpoint to trigger data deletion for the tenant.
    40  
    41  While this endpoint works as an “admin” endpoint and should not be exposed directly to tenants, it still needs to know under which tenant it should operate.
    42  For consistency with other endpoints this endpoint would therefore require an `X-Scope-OrgID` header and use the tenant from it.
    43  
    44  It is safe to call “delete_tenant” multiple times.
    45  
    46  #### GET /purger/delete_tenant_status
    47  
    48  To monitor the state of data deletion, another endpoint will be available: `/purger/delete_tenant_status`.
    49  This will return OK if tenant’s data has been fully deleted – no more blocks exist on the long-term storage, alertmanager and rulers are fully stopped and configuration removed.
    50  Similarly to “delete_tenant” endpoint, “delete_tenant_status” will require `X-Scope-OrgID` header.
    51  
    52  ### Implementation (asynchronous)
    53  
    54  Purger will implement both API endpoints.
    55  Upon receiving the call to `/purger/delete_tenant` endpoint, Purger will initiate the deletion by writing “deletion marker” objects for specific tenant to following buckets:
    56  
    57  - Blocks bucket
    58  - Ruler configuration bucket
    59  - Alertmanager configuration bucket
    60  
    61  Deletion marker for the tenant will be an object stored under tenant prefix, eg. “/<tenant>/deleted”.
    62  This object will contain the timestamp when it was created, so that we can delete it later based on the specified time period.
    63  We could reuse a subset of the proposed Thanos tombstones format, or use custom format.
    64  
    65  “delete_tenant_status” endpoint will report success, if all of the following are true:
    66  - There are no blocks for the tenant in the blocks bucket
    67  - There is a “deletion finished” object in the bucket for the tenant in the ruler configuration bucket.
    68  - There is a “deletion finished” object in the bucket for the tenant in the alertmanager configuration bucket.
    69  
    70  See later sections on Ruler and Alertmanager for explanation of “deletion finished” objects.
    71  
    72  ### Blocks deletion marker
    73  
    74  Blocks deletion marker will be used by compactor, querier and store-gateway.
    75  
    76  #### Compactor
    77  
    78  Upon discovering the blocks deletion marker, the compactor will start deletion of all blocks that belong to the tenant.
    79  This can take hours to finish, depending on the number of blocks.
    80  Even after deleting all blocks on the storage, ingesters may upload additional blocks for the tenant.
    81  To make sure that the compactor deletes these blocks, the compactor will keep the deletion marker in the bucket.
    82  After a configurable period of time it can delete the deletion marker too.
    83  
    84  To implement deletion, Compactor should use new TenantsCleaner component similar to existing BlocksCleaner (which deletes blocks marked for deletion), and modify UserScanner to ignore deleted tenants (for BlocksCleaner and Compactor itself) or only return deleted tenants (for TenantsCleaner).
    85  
    86  #### Querier, Store-gateway
    87  
    88  These two components scan for all tenants periodically, and can use the tenant deletion marker to skip tenants and avoid loading their blocks into memory and caching to disk (store-gateways).
    89  By implementing this, store-gateways will also unload and remove cached blocks.
    90  
    91  Queriers and store-gateways use metadata cache and chunks cache when accessing blocks to reduce the number of API calls to object storage.
    92  In this proposal we don’t suggest removing obsolete entries from the cache – we will instead rely on configured expiration time for cache items.
    93  
    94  Note: assuming no queries will be sent to the system once the tenant is deleted, implementing support for tenant deletion marker in queriers and store-gateways is just an optimization.
    95  These components will unload blocks once they are deleted from the object store even without this optimization.
    96  
    97  #### Query Frontend
    98  
    99  While query-frontend could use the tenant blocks deletion marker to clean up the cache, we don’t suggest to do that due to additional complexity.
   100  Instead we will only rely on eventual eviction of cached query results from the cache.
   101  It is possible to configure Cortex to set TTL for items in the frontend cache by using `-frontend.default-validity` option.
   102  
   103  #### Ingester
   104  
   105  Ingesters don’t scan object store for tenants.
   106  
   107  To clean up the local state on ingesters, we will implement closing and deletion of local per-tenant data for idle TSDB. (See Cortex PR #3491).
   108  This requires additional configuration for ingesters, specifically how long to wait before closing and deleting TSDB.
   109  This feature needs to work properly in two different scenarios:
   110  - Ingester is no longer receiving data due to ring changes (eg. scale up of ingesters)
   111  - Data is received because user has been deleted.
   112  
   113  Ingester doesn’t distinguish between the two at the moment.
   114  To make sure we don’t break queries by accidentally deleting TSDB too early, ingester needs to wait at least `-querier.query-ingesters-within` duration.
   115  
   116  Alternatively, ingester could check whether deletion marker exists on the block storage, when it detects idle TSDB.
   117  
   118  **If more data is pushed to the ingester for a given tenant, ingester will open new TSDB, build new blocks and upload them. It is therefore essential that no more data is pushed to Cortex for the tenant after calling the “delete_tenant” endpoint.**
   119  
   120  #### Ruler deletion marker
   121  
   122  This deletion marker is stored in the ruler configuration bucket.
   123  When rulers discover this marker during the periodic sync of the rule groups, they will
   124  - stop the evaluation of the rule groups for the user,
   125  - delete tenant rule groups (when multiple rulers do this at the same time, they will race for deletion, and need to be prepared to handle possible “object not found” errors)
   126  - delete local state.
   127  
   128  When the ruler is finished with this cleanup, it will ask all other rulers if they still have any data for the tenant.
   129  If all other rulers reply with “no”, ruler can write “deletion finished” marker back to the bucket.
   130  This allows rulers to ignore the ruler completely, and it also communicates the status of the deletion back to purger.
   131  
   132  Note that ruler currently relies on cached local files when using Prometheus Ruler Manager component.
   133  [This can be avoided now](https://github.com/cortexproject/cortex/issues/3134), and since it makes cleanup simpler, we suggest to modify Cortex ruler implementation to avoid this local copy.
   134  
   135  **Similarly to ingesters, it is necessary to disable access to the ruler API for deleted tenant.
   136  This must be done in external authorization proxy.**
   137  
   138  #### Alertmanager deletion marker
   139  
   140  Deletion marker for alert manager is stored in the alertmanager configuration bucket. Cleanup procedure for alertmanager data is similar to rulers – when individual alertmanager instances discover the marker, they will:
   141  - Delete tenant configuration
   142  - Delete local notifications and silences state
   143  - Ask other alertmanagers if they have any tenant state yet, and if not, write “deletion finished” marker back to the bucket.
   144  
   145  To perform the last step, Alertmanagers need to find other alertmanager instances. This will be implemented by using the ring, which will (likely) be added as per [Alertmanager scaling proposal](https://github.com/cortexproject/cortex/pull/3574).
   146  
   147  Access to Alertmanager API must be disabled for tenant that is going to be deleted.
   148  
   149  ## Alternatives Considered
   150  
   151  Another possibility how to deal with tenant data deletion is to make purger component actively communicate with ingester, compactor, ruler and alertmanagers to make data deletion faster. In this case purger would need to understand how to reach out to all those components (with multiple ring configurations, one for each component type), and internal API calls would need to have strict semantics around when the data deletion is complete. This alternative has been rejected due to additional complexity and only small benefit in terms of how fast data would be deleted.
   152