github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/storage_svcs.md

github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/storage_svcs.md (about)

     1  ---
     2  layout: post
     3  title: STORAGE SVCS
     4  permalink: /docs/storage-svcs
     5  redirect_from:
     6   - /storage_svcs.md/
     7   - /docs/storage_svcs.md/
     8  ---
     9  
    10  ## Table of Contents
    11  
    12  - [Storage Services](#storage-services)
    13    - [Notation](#notation)
    14  - [Checksumming](#checksumming)
    15  - [LRU](#lru)
    16  - [Erasure coding](#erasure-coding)
    17  - [N-way mirror](#n-way-mirror)
    18    - [Read load balancing](#read-load-balancing)
    19    - [More examples](#more-examples)
    20  - [Data redundancy: summary of the available options (and considerations)](#data-redundancy-summary-of-the-available-options-and-considerations)
    21  
    22  ## Storage Services
    23  
    24  By default, buckets inherit [global configuration](/deploy/dev/local/aisnode_config.sh). However, several distinct sections of this global configuration can be overridden at startup or at runtime on a per bucket basis. The list includes checksumming, LRU, erasure coding, and local mirroring - please see the following sections for details.
    25  
    26  ### Notation
    27  
    28  In this document, `G` - denotes a (hostname:port) pair of any gateway in the AIS cluster.
    29  
    30  ## Checksumming
    31  
    32  All cluster and object-level metadata is protected by checksums. Secondly, unless user explicitly disables checksumming for a given bucket, all user data stored in this bucket is also protected.
    33  
    34  For detailed overview, theory of operations, and supported checksumms, please see this [document](checksum.md).
    35  
    36  Example: configuring checksum properties for a bucket:
    37  
    38  ```console
    39  $ ais bucket props <bucket-name> checksum.validate_cold_get=true checksum.validate_warm_get=false checksum.type=xxhash checksum.enable_read_range=false
    40  ```
    41  
    42  For more examples, please to refer to [supported checksums and brief theory of operations](checksum.md).
    43  
    44  ## LRU
    45  
    46  The LRU (Least Recently Used) configuration is for managing storage space across the entire cluster, not just individual buckets. It helps keep the cluster running smoothly by making sure it doesn't run out of storage. The settings are split into two parts: `space` and `lru`.
    47  
    48  > Note: The LRU watermarks (`space.lowwm`, `space.highwm`, and `space.out_of_space`) apply to the entire cluster, not to individual buckets. Therefore, all settings starting with `space.*` must be configured at the cluster level.
    49  
    50  * `space.lowwm`: integer in the range [0, 100], if filesystem usage exceeds `highwm` (high water mark %) LRU tries to evict objects so the filesystem usage drops to `lowwm` (low water mark %)
    51  * `space.highwm`: integer in the range [0, 100], LRU starts immediately if a filesystem usage exceeds the value representing `highwm` (high water mark %)
    52  * `space.out_of_space`: integer in the range [0, 100], `out_of_space` (%) if exceeded, the target starts failing new PUTs and keeps failing them until its local used-cap gets back below `highwm`
    53  * `lru.dont_evict_time`: string that indicates eviction-free period [atime, atime + dont]
    54  * `lru.capacity_upd_time`: string indicating the minimum time to update capacity
    55  * `lru.enabled`: bool that determines whether LRU is run or not; only runs when true
    56  
    57  Example of setting lru/space properties:
    58  
    59  ```console
    60  $ ais config cluster space.cleanupwm=40 lru.enabled=true space.lowwm=45 space.highwm=47.15 lru.dont_evict_time=1s
    61  ```
    62  
    63  ## Erasure coding
    64  
    65  AIStore provides data protection that comes in several flavors: [end-to-end checksumming](#checksumming), [n-way mirroring](#n-way-mirror), replication (for *small* objects), and erasure coding.
    66  
    67  Erasure coding, or EC, is a well-known storage technique that protects user data by dividing it into N fragments or slices, computing K redundant (parity) slices, and then storing the resulting (N+K) slices on (N+K) storage servers - one slice per target server.
    68  
    69  EC schemas are flexible and user-configurable: users can select the N and the K (above), thus ensuring that user data remains available even if the cluster loses **any** (emphasis on the **any**) of its K servers.
    70  
    71  A bucket inherits EC settings from global configuration. But it can be overridden on a per bucket basis.
    72  
    73  * `ec.enabled`: bool - enables or disabled data protection the bucket
    74  * `ec.data_slices`: integer in the range [2, 100], representing the number of fragments the object is broken into
    75  * `ec.parity_slices`: integer in the range [2, 32], representing the number of redundant fragments to provide protection from failures. The value defines the maximum number of storage targets a cluster can lose but it is still able to restore the original object
    76  * `ec.objsize_limit`: integer indicating the minimum size of an object that is erasure encoded. Smaller objects are just replicated.
    77  * `ec.compression`: string that contains rules for LZ4 compression used by EC when it sends its fragments and replicas over network. Value "never" disables compression. Other values enable compression: it can be "always" - use compression for all transfers, or list of compression options, like "ratio=1.5" that means "disable compression automatically when compression ratio drops below 1.5"
    78  
    79  Choose the number data and parity slices depending on the required level of protection and the cluster configuration. The number of storage targets must be greater than the sum of the number of data and parity slices. If the cluster uses only replication (by setting `objsize_limit` to a very high value), the number of storage targets must exceed the number of parity slices.
    80  
    81  Rebalance supports erasure-coded buckets. Besides moving existing objects between targets, it repairs damaged objects and their slices if possible.
    82  
    83  Notes:
    84  
    85  - Every data and parity slice is stored on a separate storage target. To reconstruct a damaged object, AIStore requires at least `ec.data_slices` slices in total out of data and parity sets
    86  - Small objects are replicated `ec.parity_slices` times to have the same level of data protection that big objects do
    87  - Increasing the number of parity slices improves data protection level, but it may hit performance: doubling the number of slices approximately increases the time to encode the object by a factor of two
    88  
    89  Example of setting bucket properties:
    90  
    91  ```console
    92  $ ais bucket props ais://<bucket-name> lru.lowwm=1 lru.highwm=90 ec.enabled=true ec.data_slices=4 ec.parity_slices=2
    93  ```
    94  
    95  To change only one EC property(e.g, enable or disable EC for a bucket) without touching other bucket properties, use the single set property API. Example of disabling EC:
    96  
    97  ```console
    98  $ ais bucket props ais://<bucket-name> ec.enabled=true
    99  ```
   100  
   101  or using AIS CLI utility:
   102  
   103  enable EC for a bucket with custom number of data and parity slices. It should be done using 2 commands: the first one changes the numbers while EC is disabled, and the second one enables EC with new slice count:
   104  
   105  ```console
   106  $ ais bucket props mybucket ec.data_slices=3 ec.parity_slices=3
   107  $ ais bucket props mybucket ec.enabled=true
   108  ```
   109  
   110  check that EC properties are applied:
   111  
   112  ```console
   113  $ ais show bucket mybucket ec
   114  PROPERTY	 VALUE
   115  ec		 3:3 (256KiB)
   116  ```
   117  
   118  ### Limitations
   119  
   120  Once a bucket is configured for EC, it'll stay erasure coded for its entire lifetime - there is currently no supported way to change this once-applied configuration to a different (N, K) schema, disable EC, and/or remove redundant EC-generated content.
   121  
   122  Only option `ec.objsize_limit` can be changed if EC is enabled. Modifying this property requires `force` flag to be set.
   123  
   124  Note that after changing any EC option the cluster does not re-encode existing objects. The existing objects are rebuilt only after the objects are changed(rename, put new version etc).
   125  
   126  ## N-way mirror
   127  
   128  Yet another supported storage service is n-way mirroring providing for bucket-level data redundancy and data protection. The service makes sure that each object in a given distributed (local or Cloud) bucket has exactly **n** object replicas, where n is an arbitrary user-defined integer greater or equal 1.
   129  
   130  In other words, AIS n-way mirroring is intended to withstand loss of disks, not storage nodes (aka AIS targets).
   131  
   132  > For the latter, please consider using #erasure-coding and/or any of the alternative backup/restore mechanisms.
   133  
   134  The service ensures is that for any given object there will be *no two replicas* sharing the same local disk.
   135  
   136  > Unlike [erasure coding](#erasure-coding) that takes care of distributing redundant content across *different* clustered nodes, local mirror is, as the name implies, local. When a bucket is [configured as a mirror](/deploy/dev/local/aisnode_config.sh), objects placed into this bucket get locally replicated and the replicas are stored in local filesystems.
   137  
   138  > As aside, note that AIS storage targets can be deployed to utilize Linux LVMs that provide a variety of RAID/mirror schemas.
   139  
   140  The following example configures buckets a, b, and c to store n = 1, 2, and 3 object replicas, respectively:
   141  
   142  ```console
   143  $ ais start mirror --copies 1 ais://a
   144  $ ais start mirror --copies 2 ais://b
   145  $ ais start mirror --copies 3 ais://c
   146  ```
   147  
   148  The operations (above) are in fact [extended actions](/xact/README.md) that run asynchronously. Both Cloud and ais buckets are supported. You can monitor completion of those operations via generic [xaction API](/api/xaction.go).
   149  
   150  Subsequently, all PUTs into an n-way configured bucket also generate **n** copies for all newly created objects. Which also goes to say that the ("make-n-copies") operation, in addition to creating or destroying replicas of existing objects will also automatically re-enable(if n > 1) or disable (if n == 1) mirroring as far as subsequent PUTs are concerned.
   151  
   152  Note again that number of local replicas is defined on a per-bucket basis.
   153  
   154  ### Read load balancing
   155  With respect to n-way mirrors, the usual pros-and-cons consideration boils down to (the amount of) utilized space, on the other hand, versus data protection and load balancing, on the other.
   156  
   157  Since object replicas are end-to-end protected by [checksums](#checksumming) all of them and any one in particular can be used interchangeably to satisfy a GET request thus providing for multiple possible choices of local filesystems and, ultimately, local drives. Given n > 1, AIS will utilize the least loaded drive(s).
   158  
   159  ### More examples
   160  The following sequence creates a bucket named `abc`, PUTs an object into it and then converts it into a 3-way mirror:
   161  
   162  ```console
   163  $ ais create abc
   164  $ ais put /tmp/obj1 ais://abc/obj1
   165  $ ais start mirror --copies 3 ais://abc
   166  ```
   167  
   168  The next command will redefine the `abc` bucket created in the previous example as a 2-way mirror - all objects that were previously stored in three replicas will now have only two (replicas):
   169  
   170  ```console
   171  $ ais start mirror --copies 2 ais://abc
   172  ```
   173  
   174  ## Data redundancy: summary of the available options (and considerations)
   175  
   176  Any of the supported options can be utilized at any time (and without downtime) - the list includes:
   177  
   178  1. **cloud backend**  - [Backend Bucket](bucket.md#backend-bucket)
   179  2. **mirroring** - [N-way mirror](#n-way-mirror)
   180  3. **copying buckets**  - [Copy Bucket](/docs/cli/bucket.md#copy-bucket)
   181  4. **erasure coding** - [Erasure coding](#erasure-coding)
   182  
   183  For instance, you first could start with plain mirroring via `ais start mirror BUCKET --copies N`, where N would be less or equal the number of target mountpaths (disks).
   184  
   185  > It is generally assumed (and also strongly recommended) that all storage servers (targets) in AIS cluster have the same number of disks and are otherwise identical.
   186  
   187  Copies will then be created on different disks of each storage target - for all already stored and future objects in a given bucket.
   188  
   189  This option won't protect from node failures but it will provide a fairly good performance for writes and load balancing - for reads. As far as data redundancy, N-way mirror protects from failures of up to (N-1) disks in a storage server.
   190  
   191  > It is the performance and the fact that probabilities of disk failures are orders of magnitude greater than node failures makes this "N-way mirror" option attractive, possibly in combination with periodic backups.
   192  
   193  Further, you could at some point in time decide to associate a given AIS bucket with a Cloud (backend) bucket, thus making sure that your data is stored in one of the AIS-supported Clouds: Amazon S3, Google Cloud Storage, Azure Blob Storage.
   194  
   195  Finally, you could erasure code (EC) a given bucket for `D + P` redundancy, where `D` and `P` are, respectively, the numbers of data and parity slices. For example:
   196  
   197  ```console
   198  $ ais start ec-encode -d 6 -p 4 abc
   199  ```
   200  
   201  will erasure-code all objects in the `abc` bucket for the total of 10 slices stored on different AIS targets, plus 1 (one) full replica. In other words, this example requires at least `10 + 1 = 11` targets in the cluster.
   202  
   203  > Generally, `D + P` erasure coding requires that AIS cluster has `D + P + 1` targets, or more.
   204  
   205  > In addition to Reed-Solomon encoded slices, we currently always store a full replica - the strategy that uses available capacity but pays back with read performance.