github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/lifecycle.md

github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/lifecycle.md (about)

     1  ---
     2  layout: post
     3  title: Lifecycle
     4  permalink: /docs/lifecycle
     5  redirect_from:
     6   - /lifecycle.md/
     7   - /docs/lifecycle.md/
     8  ---
     9  
    10  There's a set of topics in system management that can often be found under alternative subtitles:
    11  "graceful termination and cleanup", "shutting down and restarting", "adding/removing members", "joining and leaving cluster", and similar.
    12  
    13  Any discussion along those lines typically involves state transitions, so let's go ahead and name the states (and transitions).
    14  
    15  ![Node lifecycle: states and transitions](images/lifecycle-graceful-term.png)
    16  
    17  To put things in perspective, this picture is about a node (not shown) in an aistore cluster (not shown). Tracking it from the top downwards, first notice a state called "maintenance mode". "Maintenance mode" constitutes maybe the most gentle, if you will, way of removing a node from the operating cluster.
    18  
    19  When in maintenance, the node stops keep-alive heartbeats but remains in the cluster map and remains connected. That is, unless you disconnect or shut it down manually (which would be perfectly fine and expected).
    20  
    21  Next is "shutdown". Graceful shutdown can also be achieved in a single shot, as indicated by one of those curly arrows in the picture's left:
    22   "online" => "shutdown".
    23  
    24  When in "shutdown", a node can easily get back and rejoin the cluster at any _later_ time. It'll take two steps, not one - see the blue arrows on the picture's right where RESTART must be understood as a deployment-specific operation (e.g., `kubectl run`).
    25  
    26  Both "maintenance" and "shutdown" involve a certain intra-cluster operation called "global rebalance" (aka "rebalance").
    27  
    28  But before we talk about it in any greater detail, let's finish with the node's lifecycle. The third and final special state is "decommission". Loosely synonymous with *cleanup* (a very thorough cleanup, as it were), "decommission" entails:
    29  
    30  * migrating all user data the node is storing to other nodes that are currently "online" - the step that's followed by:
    31  * partial or complete cleanup of the node in question, whereby the complete cleanup further entails:
    32  * removing all AIS metadata, all configuration files, and - last but not least - user data in its entirety.
    33  
    34  Needless to say, there's no way back out of "decommission" - the proverbial point of no return. To rejoin the cluster, a decommissioned node will have to be redeployed from scratch, but then it would be a totally different node, of course...
    35  
    36  ### Table of Contents
    37  - [Cluster](#cluster)
    38  - [Privileges](#privileges)
    39  - [Rebalance](#rebalance)
    40  - [Summary](#summary)
    41  - [Usage](#usage)
    42  - [References](#references)
    43  
    44  ## Cluster
    45  
    46  There's one question that absolutely cannot wait: how to "terminate" or "cleanup" a cluster? Here's how:
    47  
    48  ```console
    49  $ ais cluster decommission --rm-user-data --yes
    50  ```
    51  
    52  The above command will destroy an existing cluster - completely and utterly, no questions asked. It can be conveniently used in testing/benchmarking situations or in any sort of non-production environment - see `--help` for details. It also executes very fast - Ctrl-C's unlikely to help in case of change-of-mind...
    53  
    54  ## Privileges
    55  
    56  Full Disclosure: all lifecycle management commands and all associated APIs require administrative privileges. There are, essentially, three ways:
    57  
    58  * deploy the cluster with authentication disabled:
    59  
    60  ```console
    61  $ ais config cluster auth --json
    62      "auth": {
    63          "secret": "xxxxxxxx",
    64          "enabled": false
    65      }
    66  ```
    67  * use integrated `AuthN` server that provides OAuth 2.0 compliant JWT and a set of [easy commands](/docs/cli/auth.md) to manage users and roles (with certain permissions to access certain clusters, etc.);
    68  * outsource authorization to a separate, centralized (usually, LDAP-integrated) management system to manage existing users, groups, and mappings.
    69  
    70  ## Rebalance
    71  
    72  Conceptually, aistore rebalance is similar to what's often called "RAID rebuild". The underlying mechanics would be very different but the general idea is the same: user data massively migrating from some nodes in a cluster (or disks in an array) to some other nodes (disks), and vice versa.
    73  
    74  In aistore, all the migration (aka "rebalancing") that's taking in place is the system response to a lifecycle event that's already happened or is about to happen. In fact, it is the response to satisfy a singular purpose and a single location-governing rule that simply states: **user data must be _properly_ located**.
    75  
    76  ### Proper location
    77  
    78  For any object in a cluster, its _proper_ location is defined by the current cluster map and locally - on each target node - by the locally configured target's [mountpaths](overview.md#terminology).
    79  
    80  In that sense, the "maintenance" state, for instance, has its _beginning_ - when the cluster starts rebalancing, and the post-rebalancing _end_, whereby the corresponding sub-state get recorded in a new version of the cluster map, which then gets safely distributed across all nodes, etc., etc.
    81  
    82  Next section gives an example and clarifies "maintenance sub-states" - in color.
    83  
    84  ### Quick example
    85  
    86  Given a 3-node single-gateway cluster, we go ahead and shut down one of the nodes:
    87  
    88  ```console
    89  $ ais cluster add-remove-nodes shutdown <TAB-TAB>
    90  p[MWIp8080]   t[ikht8083]   t[noXt8082]   t[VmQt8081]
    91  
    92  $ ais cluster add-remove-nodes shutdown t[ikht8083] -y
    93  
    94  Started rebalance "g47" (to monitor, run 'ais show rebalance').
    95  t[ikht8083] is shutting down, please wait for cluster rebalancing to finish
    96  
    97  Note: the node t[ikht8083] is _not_ decommissioned - it remains in the cluster map and can be manually
    98  restarted at any later time (and subsequently activated via 'stop-maintenance' operation).
    99  ```
   100  
   101  Once the command is executed, notice the following:
   102  
   103  ```console
   104  $ ais show cluster
   105  ...
   106  t[ikht8083][x]   -   -   -   -   maintenance
   107  ```
   108  
   109  At first, `maintenance` will show up in red indicating a simple fact that data is expeditiously migrating from the node (which is about to leave the cluster).
   110  
   111  > A visual cue, which is supposed to imply something like: "please don't disconnect, do not power off".
   112  
   113  But eventually, if you run the command periodically:
   114  
   115  ```console
   116  $ ais show cluster --refresh 3
   117  ```
   118  
   119  or a few times manually - eventually `show cluster` will report that rebalance ("g47" in the example) has finished and the node `t[ikht8083]` - gracefully terminated. Simultaneously, `maintenance` in the `show` output will become non-red:
   120  
   121  | when rebalancing | after |
   122  | --- | --- |
   123  | $${\color{red}maintenance}$$ | $${\color{cyan}maintenance}$$ |
   124  
   125  The takeaway: global rebalance runs its full way _before_ the node in question is permitted to leave. If interrupted for any reason whatsoever (power-cycle, network disconnect, new node joining, cluster shutdown, etc.) - rebalance will resume and will keep going until the [governing condition](#proper-location) is fully and globally satisfied.
   126  
   127  ## Summary
   128  
   129  | lifecycle operation | CLI |  brief description |
   130  | --- | --- | -- |
   131  | maintenance mode | `start-maintenance`  | The most lightweight way to remove a node. Stop keep-alive heartbeats, do not insist on metadata updates - ignore the failures. For advanced usage options, see `--help`. |
   132  | shutdown | `shutdown` | Same as above, plus node shutdown (`aisnode` exit). |
   133  | decommission | `decommission` | Same as above, plus partial (metadata only) or complete (both data and AIS metadata) cleanup. A decommissioned node is forever "forgotten" - removed from the cluster map. |
   134  | remove node from cluster map | `ais advanced remove-from-smap` | Strictly intended for testing purposes and special use-at-your-own-risk scenarios. Immediately remove the node from the cluster and distribute updated cluster map with no rebalancing. |
   135  | take node out of maintenance | `stop-maintenance`  | Update the node with the current cluster-level metadata, re-enable keep-alive, run global rebalance. Finally, when all succeeds, distribute updated cluster map (where the node shows up "online"). |
   136  | join new node (ie., grow cluster) | `join` | Essentially, same as above: update the node, run global rebalance, etc. |
   137  
   138  ### Assorted notes
   139  
   140  Normally, a starting-up AIS node (`aisnode`) will use its local [configuration](/docs/configuration.md) to communicate with any other node in the cluster and perform what's called [self-join](https://github.com/NVIDIA/aistore/blob/main/api/apc/actmsg.go). The latter does not require a `join` command or any other explicit administration.
   141  
   142  Still, the `join` command can solve the case when the node is misconfigured. Secondly and separately, it can be used to join a standby node - a node that started in a `standby` mode, as per:
   143  
   144  * [`aisnode` command line](/docs/command_line.md)
   145  
   146  When rebalancing, the cluster remains fully operational and can be used to read and write data, list, create, and destroy buckets, run jobs, and more. In other words, none of the listed lifecycle operations requires downtime. The idea is that users never notice (and if the cluster has enough spare capacity - they won't).
   147  
   148  ## References
   149  
   150  * [CLI: cluster management commands](/docs/cli/cluster.md)
   151    - [Joining](/docs/join_cluster.md)
   152    - [Leaving](/docs/leave_cluster.md)
   153  * [Global Rebalance](/docs/rebalance.md)
   154  * [AuthN](/docs/authn.md)
   155  * [AIS on Kubernetes deployment: playbooks](https://github.com/NVIDIA/ais-k8s/tree/master/playbooks)