github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/rebalance.md

github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/rebalance.md (about)

     1  ---
     2  layout: post
     3  title: REBALANCE
     4  permalink: /docs/rebalance
     5  redirect_from:
     6   - /rebalance.md/
     7   - /docs/rebalance.md/
     8  ---
     9  
    10  ## Table of Contents
    11  
    12  - [Global Rebalance](#global-rebalance)
    13  - [CLI: usage examples](#cli-usage-examples)
    14  - [Automated Resilvering](#automated-resilvering)
    15  
    16  ## Global Rebalance
    17  
    18  To maintain [consistent distribution of user data at all times](https://en.wikipedia.org/wiki/Consistent_hashing#Examples_of_use), AIStore rebalances itself based on *new* versions of its [cluster map](/cluster/map.go).
    19  
    20  More exactly:
    21  
    22  * When storage targets join or leave the cluster, the current *primary* (leader) proxy transactionally creates the *next* updated version of the cluster map;
    23  * [Synchronizes](/ais/metasync.go) the new map across the entire cluster so that each and every node gets the version;
    24  * Which further results in each AIS target starting to traverse its locally stored content, recomputing object locations,
    25  * And sending at least some of the objects to their respective *new* locations
    26  * Whereby object migration is carried out via intra-cluster optimized [communication mechanism](/transport/README.md) and over a separate [physical or logical network](/cmn/network.go), if provisioned.
    27  
    28  Thus, cluster-wide rebalancing is totally and completely decentralized. When a single server joins (or goes down in a) cluster of N servers, approximately 1/Nth of the entire namespace will get rebalanced via direct target-to-target transfers.
    29  
    30  Further, cluster-wide rebalancing does not require any downtime.
    31  Incoming GET requests for the objects that haven't yet migrated (or are being moved) are handled internally via the mechanism that we call "get-from-neighbor".
    32  The (rebalancing) target that must (according to the new cluster map) have the object but doesn't, will locate its "neighbor", get the object, and satisfy the original GET request transparently from the user.
    33  
    34  Similar to all other AIS modules and sub-systems, global rebalance is controlled and monitored via the documented [RESTful API](http_api.md).
    35  It might be easier and faster, though, to use [AIS CLI](/docs/cli.md) - see next section.
    36  
    37  ## CLI: usage examples
    38  
    39  1. Disable automated global rebalance (for instance, to perform maintenance or upgrade operations) and show resulting config in JSON on a randomly selected target:
    40  
    41  ```console
    42  $ ais config cluster rebalance.enabled=false
    43  config successfully updated
    44  
    45  $ ais show config 361179t8088 --json | grep -A 6  rebalance
    46  
    47      "rebalance": {
    48          "dest_retry_time": "2m",
    49          "quiescent": "20s",
    50          "compression": "never",
    51          "multiplier": 4,
    52          "enabled": false
    53      },
    54  
    55  ```
    56  
    57  2. Re-enable automated global rebalance and show resulting config section as a simple `name/value` list:
    58  
    59  ```console
    60  $ ais config cluster rebalance.enabled=true
    61  config successfully updated
    62  
    63  $ ais show config <TAB-TAB>
    64  125210p8082   181883t8089   249630t8087   361179t8088   477343p8081   675515t8084   70681p8080    782227p8083   840083t8086   911875t8085
    65  
    66  $ ais show config 840083t8086 rebalance
    67  PROPERTY                         VALUE   DEFAULT
    68  rebalance.compression            never   -
    69  rebalance.dest_retry_time        2m      -
    70  rebalance.enabled                true    -
    71  rebalance.multiplier             2       -
    72  rebalance.quiescent              10s     -
    73  ```
    74  
    75  3. Monitoring: notice per-target statistics and the `EndTime` column
    76  
    77  ```console
    78  $ ais show rebalance
    79  DaemonID     RebID   ObjRcv  SizeRcv  ObjSent  SizeSent  StartTime       EndTime          Aborted
    80  ======       ======  ======  ======   ======   ======    ======          ======           ======
    81  181883t8089  1       0       0B       1058     1.27MiB   04-28 16:05:35  <not completed>  false
    82  249630t8087  1       0       0B       988      1.18MiB   04-28 16:05:35  <not completed>  false
    83  361179t8088  1       5029    6.02MiB  0        0B        04-28 16:05:35  <not completed>  false
    84  675515t8084  1       0       0B       989      1.18MiB   04-28 16:05:35  <not completed>  false
    85  840083t8086  1       0       0B       974      1.17MiB   04-28 16:05:35  <not completed>  false
    86  911875t8085  1       0       0B       1020     1.22MiB   04-28 16:05:35  <not completed>  false
    87  
    88  $ ais show rebalance
    89  DaemonID     RebID   ObjRcv  SizeRcv  ObjSent  SizeSent  StartTime       EndTime         Aborted
    90  ======       ======  ======  ======   ======   ======    ======          ======          ======
    91  181883t8089  1       0       0B       1058     1.27MiB   04-28 16:05:35  04-28 16:05:53  false
    92  249630t8087  1       0       0B       988      1.18MiB   04-28 16:05:35  04-28 16:05:53  false
    93  361179t8088  1       5029    6.02MiB  0        0B        04-28 16:05:35  04-28 16:05:53  false
    94  675515t8084  1       0       0B       989      1.18MiB   04-28 16:05:35  04-28 16:05:53  false
    95  840083t8086  1       0       0B       974      1.17MiB   04-28 16:05:35  04-28 16:05:53  false
    96  911875t8085  1       0       0B       1020     1.22MiB   04-28 16:05:35  04-28 16:05:53  false
    97  ```
    98  
    99  4. Since global rebalance is an [extended action (xaction)](/xact/README.md), it can be also monitored via generic `show xaction` API:
   100  
   101  ```console
   102  $ ais show job xaction rebalance
   103  NODE             ID      KIND            BUCKET  OBJECTS         BYTES           START           END     STATE
   104  181883t8089      g2      rebalance       -       1058            1.27MiB         04-28 16:10:14  -       Running
   105  ...
   106  ```
   107  
   108  5. Finally, you can always start and stop global rebalance administratively, for instance:
   109  
   110  
   111  ```console
   112  $ ais start rebalance
   113  ```
   114  
   115  ## Automated Resilvering
   116  
   117  While rebalance (previous section) takes care of the cluster *grow* and *shrink* events, resilver, as the name implies, is responsible for the [mountpath](overview.md#terminology) *added* and [mountpath](overview.md#terminology) *removed* events handled locally within (and by) each storage target.
   118  
   119  In other words, global rebalance handles scaling (up and down) of the entire AIS cluster while automated *resilvering* takes care of disk attachments and disk faults within a given storage node.
   120  
   121  * A [mountpath](overview.md#terminology) is a single disk **or** a volume (a RAID) formatted with a local filesystem of choice, **and** a local directory that AIS utilizes to store user data and AIS metadata. A mountpath can be disabled and (re)enabled, automatically or administratively, at any point during runtime. In a given cluster, a total number of mountpaths would normally compute as a direct product of `(number of storage targets) x (number of disks in each target)`.
   122  
   123  As stated, mountpath removal can be done administratively (via API) or be triggered by a disk fault (see [filesystem health checking](/health/fshc.md).
   124  Irrespectively of the original cause, mountpath-level events activate resilver that in many ways performs the same set of steps as the rebalance.
   125  The one salient difference is that all object migrations are local (and, therefore, relatively fast(er)).
   126  
   127  ### CLI Usage
   128  
   129  Resilvering can be run on a specific target node or the entire cluster (when all targets execute resilvering in parallel).
   130  
   131  Similar to global rebalancing, resilvering is a managed *eXtended operation* or [xaction](ic.md).
   132  All xactions execute asyncrhonously and support a common set of documented APIs to start, terminate the xaction, inquire its progress, etc. The progress of resilvering can be monitored via `ais show job xaction` CLI.
   133  
   134  Examples:
   135  
   136  ```console
   137  $ ais advanced resilver # all targets will be resilvered
   138  Started resilver "NGxmOthtE", use 'ais show job xaction NGxmOthtE' to monitor the progress
   139  
   140  $ ais advanced resilver BUQOt8086  # resilver a single node
   141  Started resilver "NGxmOthtE", use 'ais show job xaction NGxmOthtE' to monitor the progress
   142  ```
   143  
   144  Automated resilvering can also be disabled. Just like with `rebalance`, the resulting config can be viewed through the CLI:
   145  NOTE: When automated resilvering is disabled, removing a mountpath may result in data loss.
   146  
   147  ```console
   148  $ ais config cluster resilver.enabled=false
   149  config successfully updated
   150  
   151  $ ais show config 361179t8088 resilver --json | grep -A 2 resilver
   152      "resilver": {
   153          "enabled": false
   154      },
   155  
   156  $ ais config cluster resilver.enabled=true
   157  config successfully updated
   158  
   159  $ ais show config <TAB-TAB>
   160  125210p8082   181883t8089   249630t8087   361179t8088   477343p8081   675515t8084   70681p8080    782227p8083   840083t8086   911875t8085
   161  
   162  $ ais show config 361179t8088 resilver
   163  PROPERTY                 VALUE
   164  resilver.enabled         true
   165  ```
   166  
   167  ## IO Performance
   168  
   169  During rebalancing, response latency and overall cluster throughput may substantially degrade.