github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/storage_svcs.md (about) 1 --- 2 layout: post 3 title: STORAGE SVCS 4 permalink: /docs/storage-svcs 5 redirect_from: 6 - /storage_svcs.md/ 7 - /docs/storage_svcs.md/ 8 --- 9 10 ## Table of Contents 11 12 - [Storage Services](#storage-services) 13 - [Notation](#notation) 14 - [Checksumming](#checksumming) 15 - [LRU](#lru) 16 - [Erasure coding](#erasure-coding) 17 - [N-way mirror](#n-way-mirror) 18 - [Read load balancing](#read-load-balancing) 19 - [More examples](#more-examples) 20 - [Data redundancy: summary of the available options (and considerations)](#data-redundancy-summary-of-the-available-options-and-considerations) 21 22 ## Storage Services 23 24 By default, buckets inherit [global configuration](/deploy/dev/local/aisnode_config.sh). However, several distinct sections of this global configuration can be overridden at startup or at runtime on a per bucket basis. The list includes checksumming, LRU, erasure coding, and local mirroring - please see the following sections for details. 25 26 ### Notation 27 28 In this document, `G` - denotes a (hostname:port) pair of any gateway in the AIS cluster. 29 30 ## Checksumming 31 32 All cluster and object-level metadata is protected by checksums. Secondly, unless user explicitly disables checksumming for a given bucket, all user data stored in this bucket is also protected. 33 34 For detailed overview, theory of operations, and supported checksumms, please see this [document](checksum.md). 35 36 Example: configuring checksum properties for a bucket: 37 38 ```console 39 $ ais bucket props <bucket-name> checksum.validate_cold_get=true checksum.validate_warm_get=false checksum.type=xxhash checksum.enable_read_range=false 40 ``` 41 42 For more examples, please to refer to [supported checksums and brief theory of operations](checksum.md). 43 44 ## LRU 45 46 The LRU (Least Recently Used) configuration is for managing storage space across the entire cluster, not just individual buckets. It helps keep the cluster running smoothly by making sure it doesn't run out of storage. The settings are split into two parts: `space` and `lru`. 47 48 > Note: The LRU watermarks (`space.lowwm`, `space.highwm`, and `space.out_of_space`) apply to the entire cluster, not to individual buckets. Therefore, all settings starting with `space.*` must be configured at the cluster level. 49 50 * `space.lowwm`: integer in the range [0, 100], if filesystem usage exceeds `highwm` (high water mark %) LRU tries to evict objects so the filesystem usage drops to `lowwm` (low water mark %) 51 * `space.highwm`: integer in the range [0, 100], LRU starts immediately if a filesystem usage exceeds the value representing `highwm` (high water mark %) 52 * `space.out_of_space`: integer in the range [0, 100], `out_of_space` (%) if exceeded, the target starts failing new PUTs and keeps failing them until its local used-cap gets back below `highwm` 53 * `lru.dont_evict_time`: string that indicates eviction-free period [atime, atime + dont] 54 * `lru.capacity_upd_time`: string indicating the minimum time to update capacity 55 * `lru.enabled`: bool that determines whether LRU is run or not; only runs when true 56 57 Example of setting lru/space properties: 58 59 ```console 60 $ ais config cluster space.cleanupwm=40 lru.enabled=true space.lowwm=45 space.highwm=47.15 lru.dont_evict_time=1s 61 ``` 62 63 ## Erasure coding 64 65 AIStore provides data protection that comes in several flavors: [end-to-end checksumming](#checksumming), [n-way mirroring](#n-way-mirror), replication (for *small* objects), and erasure coding. 66 67 Erasure coding, or EC, is a well-known storage technique that protects user data by dividing it into N fragments or slices, computing K redundant (parity) slices, and then storing the resulting (N+K) slices on (N+K) storage servers - one slice per target server. 68 69 EC schemas are flexible and user-configurable: users can select the N and the K (above), thus ensuring that user data remains available even if the cluster loses **any** (emphasis on the **any**) of its K servers. 70 71 A bucket inherits EC settings from global configuration. But it can be overridden on a per bucket basis. 72 73 * `ec.enabled`: bool - enables or disabled data protection the bucket 74 * `ec.data_slices`: integer in the range [2, 100], representing the number of fragments the object is broken into 75 * `ec.parity_slices`: integer in the range [2, 32], representing the number of redundant fragments to provide protection from failures. The value defines the maximum number of storage targets a cluster can lose but it is still able to restore the original object 76 * `ec.objsize_limit`: integer indicating the minimum size of an object that is erasure encoded. Smaller objects are just replicated. 77 * `ec.compression`: string that contains rules for LZ4 compression used by EC when it sends its fragments and replicas over network. Value "never" disables compression. Other values enable compression: it can be "always" - use compression for all transfers, or list of compression options, like "ratio=1.5" that means "disable compression automatically when compression ratio drops below 1.5" 78 79 Choose the number data and parity slices depending on the required level of protection and the cluster configuration. The number of storage targets must be greater than the sum of the number of data and parity slices. If the cluster uses only replication (by setting `objsize_limit` to a very high value), the number of storage targets must exceed the number of parity slices. 80 81 Rebalance supports erasure-coded buckets. Besides moving existing objects between targets, it repairs damaged objects and their slices if possible. 82 83 Notes: 84 85 - Every data and parity slice is stored on a separate storage target. To reconstruct a damaged object, AIStore requires at least `ec.data_slices` slices in total out of data and parity sets 86 - Small objects are replicated `ec.parity_slices` times to have the same level of data protection that big objects do 87 - Increasing the number of parity slices improves data protection level, but it may hit performance: doubling the number of slices approximately increases the time to encode the object by a factor of two 88 89 Example of setting bucket properties: 90 91 ```console 92 $ ais bucket props ais://<bucket-name> lru.lowwm=1 lru.highwm=90 ec.enabled=true ec.data_slices=4 ec.parity_slices=2 93 ``` 94 95 To change only one EC property(e.g, enable or disable EC for a bucket) without touching other bucket properties, use the single set property API. Example of disabling EC: 96 97 ```console 98 $ ais bucket props ais://<bucket-name> ec.enabled=true 99 ``` 100 101 or using AIS CLI utility: 102 103 enable EC for a bucket with custom number of data and parity slices. It should be done using 2 commands: the first one changes the numbers while EC is disabled, and the second one enables EC with new slice count: 104 105 ```console 106 $ ais bucket props mybucket ec.data_slices=3 ec.parity_slices=3 107 $ ais bucket props mybucket ec.enabled=true 108 ``` 109 110 check that EC properties are applied: 111 112 ```console 113 $ ais show bucket mybucket ec 114 PROPERTY VALUE 115 ec 3:3 (256KiB) 116 ``` 117 118 ### Limitations 119 120 Once a bucket is configured for EC, it'll stay erasure coded for its entire lifetime - there is currently no supported way to change this once-applied configuration to a different (N, K) schema, disable EC, and/or remove redundant EC-generated content. 121 122 Only option `ec.objsize_limit` can be changed if EC is enabled. Modifying this property requires `force` flag to be set. 123 124 Note that after changing any EC option the cluster does not re-encode existing objects. The existing objects are rebuilt only after the objects are changed(rename, put new version etc). 125 126 ## N-way mirror 127 128 Yet another supported storage service is n-way mirroring providing for bucket-level data redundancy and data protection. The service makes sure that each object in a given distributed (local or Cloud) bucket has exactly **n** object replicas, where n is an arbitrary user-defined integer greater or equal 1. 129 130 In other words, AIS n-way mirroring is intended to withstand loss of disks, not storage nodes (aka AIS targets). 131 132 > For the latter, please consider using #erasure-coding and/or any of the alternative backup/restore mechanisms. 133 134 The service ensures is that for any given object there will be *no two replicas* sharing the same local disk. 135 136 > Unlike [erasure coding](#erasure-coding) that takes care of distributing redundant content across *different* clustered nodes, local mirror is, as the name implies, local. When a bucket is [configured as a mirror](/deploy/dev/local/aisnode_config.sh), objects placed into this bucket get locally replicated and the replicas are stored in local filesystems. 137 138 > As aside, note that AIS storage targets can be deployed to utilize Linux LVMs that provide a variety of RAID/mirror schemas. 139 140 The following example configures buckets a, b, and c to store n = 1, 2, and 3 object replicas, respectively: 141 142 ```console 143 $ ais start mirror --copies 1 ais://a 144 $ ais start mirror --copies 2 ais://b 145 $ ais start mirror --copies 3 ais://c 146 ``` 147 148 The operations (above) are in fact [extended actions](/xact/README.md) that run asynchronously. Both Cloud and ais buckets are supported. You can monitor completion of those operations via generic [xaction API](/api/xaction.go). 149 150 Subsequently, all PUTs into an n-way configured bucket also generate **n** copies for all newly created objects. Which also goes to say that the ("make-n-copies") operation, in addition to creating or destroying replicas of existing objects will also automatically re-enable(if n > 1) or disable (if n == 1) mirroring as far as subsequent PUTs are concerned. 151 152 Note again that number of local replicas is defined on a per-bucket basis. 153 154 ### Read load balancing 155 With respect to n-way mirrors, the usual pros-and-cons consideration boils down to (the amount of) utilized space, on the other hand, versus data protection and load balancing, on the other. 156 157 Since object replicas are end-to-end protected by [checksums](#checksumming) all of them and any one in particular can be used interchangeably to satisfy a GET request thus providing for multiple possible choices of local filesystems and, ultimately, local drives. Given n > 1, AIS will utilize the least loaded drive(s). 158 159 ### More examples 160 The following sequence creates a bucket named `abc`, PUTs an object into it and then converts it into a 3-way mirror: 161 162 ```console 163 $ ais create abc 164 $ ais put /tmp/obj1 ais://abc/obj1 165 $ ais start mirror --copies 3 ais://abc 166 ``` 167 168 The next command will redefine the `abc` bucket created in the previous example as a 2-way mirror - all objects that were previously stored in three replicas will now have only two (replicas): 169 170 ```console 171 $ ais start mirror --copies 2 ais://abc 172 ``` 173 174 ## Data redundancy: summary of the available options (and considerations) 175 176 Any of the supported options can be utilized at any time (and without downtime) - the list includes: 177 178 1. **cloud backend** - [Backend Bucket](bucket.md#backend-bucket) 179 2. **mirroring** - [N-way mirror](#n-way-mirror) 180 3. **copying buckets** - [Copy Bucket](/docs/cli/bucket.md#copy-bucket) 181 4. **erasure coding** - [Erasure coding](#erasure-coding) 182 183 For instance, you first could start with plain mirroring via `ais start mirror BUCKET --copies N`, where N would be less or equal the number of target mountpaths (disks). 184 185 > It is generally assumed (and also strongly recommended) that all storage servers (targets) in AIS cluster have the same number of disks and are otherwise identical. 186 187 Copies will then be created on different disks of each storage target - for all already stored and future objects in a given bucket. 188 189 This option won't protect from node failures but it will provide a fairly good performance for writes and load balancing - for reads. As far as data redundancy, N-way mirror protects from failures of up to (N-1) disks in a storage server. 190 191 > It is the performance and the fact that probabilities of disk failures are orders of magnitude greater than node failures makes this "N-way mirror" option attractive, possibly in combination with periodic backups. 192 193 Further, you could at some point in time decide to associate a given AIS bucket with a Cloud (backend) bucket, thus making sure that your data is stored in one of the AIS-supported Clouds: Amazon S3, Google Cloud Storage, Azure Blob Storage. 194 195 Finally, you could erasure code (EC) a given bucket for `D + P` redundancy, where `D` and `P` are, respectively, the numbers of data and parity slices. For example: 196 197 ```console 198 $ ais start ec-encode -d 6 -p 4 abc 199 ``` 200 201 will erasure-code all objects in the `abc` bucket for the total of 10 slices stored on different AIS targets, plus 1 (one) full replica. In other words, this example requires at least `10 + 1 = 11` targets in the cluster. 202 203 > Generally, `D + P` erasure coding requires that AIS cluster has `D + P + 1` targets, or more. 204 205 > In addition to Reed-Solomon encoded slices, we currently always store a full replica - the strategy that uses available capacity but pays back with read performance.