github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/out_of_band.md (about) 1 --- 2 layout: post 3 title: Out-of-band updates 4 permalink: /docs/out_of_band 5 redirect_from: 6 - /out_of_band.md/ 7 - /docs/out_of_band.md/ 8 --- 9 10 ## Out-of-band updates 11 12 One (but not the only one) way to deal with out-of-band updates is to configure bucket as follows: 13 14 ```console 15 $ ais bucket props set s3://abc versioning.validate_warm_get true 16 "versioning.validate_warm_get" set to: "true" (was: "false") 17 ``` 18 19 Here, `s3://abc` is presumably an Amazon S3 bucket, but it could be any Cloud or remote AIS bucket. 20 21 > It could also be any `ais://` bucket with Cloud or remote AIS backend. For usage, see `backend_bck` option in CLI documentation and examples. 22 23 Once `validate_warm_get` is set, **any read** operation on the bucket will take a bit of extra time to compare the in-cluster metadata with its remote counterpart. 24 25 Further, if and when this comparison fails, aistore performs a _cold_ GET, to create a new copy of the remote object and make sure that the cluster has the latest version. 26 27 Needless to say, the latest version will be always returned to the user as well. 28 29 ## Lesser scope 30 31 But sometimes, we may want to perform a single given operation without updating bucket configuration. For instance: 32 33 ```console 34 $ ais prefetch s3://abc --latest 35 36 prefetch-objects[f70MKzP63]: prefetch entire bucket s3://abc. To monitor the progress, run 'ais show job f70MKzP63' 37 ``` 38 39 Notice the `--latest` switch above. As far as this particular `prefetch` is concerned `--latest` will have the same effect as setting `versioning.validate_warm_get=true`. But only "as far" - the scope of validating in-cluster versions will be limited to this specific batch job. 40 41 The same applies to copying buckets, [copying ranges and lists of objects](/docs/cli/bucket.md#copy-multiple-objects), and certainly getting (as in `GET`) individual objects. 42 43 Here's the an excerpt from `GET` help (and note `--latest` below): 44 45 ```console 46 $ ais get --help 47 48 USAGE: 49 ais get [command options] BUCKET[/OBJECT_NAME] [OUT_FILE|OUT_DIR|-] 50 51 OPTIONS: 52 --offset value object read offset; must be used together with '--length'; default formatting: IEC (use '--units' to override) 53 --length value object read length; default formatting: IEC (use '--units' to override) 54 --checksum validate checksum 55 --yes, -y assume 'yes' to all questions 56 --check-cached instead of GET execute HEAD(object) to check if the object is present in aistore 57 (applies only to buckets with remote backend) 58 --latest GET, prefetch, or copy the latest object version from the associated remote bucket; 59 allows operation-level control over object version synchronization _without_ changing bucket configuration 60 (the latter can be done using 'ais bucket props set BUCKET versioning') 61 ... 62 ``` 63 64 ### See also 65 66 * [`ais cp` command](/docs/cli/bucket.md) and, in particular, its `--sync` option. 67 - [Example copying buckets and multi-objects with simultaneous synchronization](/docs/cli/bucket.md#example-copying-buckets-and-multi-objects-with-simultaneous-synchronization) 68 69 ## Out-of-band writes, deletes, and more 70 71 1. with version validation enabled, aistore will detect both out-of-band writes and deletes; 72 2. buckets with versioning disabled are also supported; 73 3. decision on whether to perform cold-GET is made upon comparing remote and local metadata; 74 4. the latter always includes object size, but also may include any combination of: 75 - `version` 76 - `ETag` 77 - `ais object checksum` (by default, xxhash that we store as part of custom Cloud metadata) 78 - `MD5` 79 - `CRC32C` 80 81 To enable version validation, run: 82 83 ```console 84 $ ais bucket props set BUCKET versioning.validate_warm_get true 85 86 ## optionally: 87 88 $ ais bucket props show BUCKET versioning 89 PROPERTY VALUE 90 versioning.enabled ... 91 versioning.validate_warm_get true 92 versioning.synchronize false 93 ``` 94 95 No assumption is being made on whether any of the above is present (except, of course, the size aka "Content-Length"). 96 97 The rules are simple: 98 99 * compare _existing_ items of the same kind (`size` vs `size`, `MD5` and `MD5`, etc.); 100 * fail immediately - that is, require cold GET - if any pair of comparable items differ; 101 * count all matches except `size` (in other words, same size does _not_ contribute to decision in favor of skipping cold GET); 102 * exclude double counting (which is mostly relevant for `ETag` vs `MD5`); 103 * require **two or more matches**. 104 105 When there are no matches, we go ahead with cold GET. 106 107 A single match - e.g. only the `version` (if exists), or only `ETag`, etc. - is currently resolved positively iff the source backend is the same as well. 108 109 > E.g., copying object from Amazon to Google and then performing validated GET with aistore backend "pointing" to Google - will fail the match. 110 111 > TODO: make it configurable to require at least two matches. 112 113 Needless to say, if querying remote metadata fails the corresponding GET transaction will fail as well. 114 115 ## When reading in-cluster data causes deletion 116 117 But there's one special condition when the call to query remote metadata returns "object not found". In other words, when the remote backend unambiguously indicates that the remote object does not exist (any longer). 118 119 In this case, there are two configurable choices as per (already shown) `versioning` section of the bucket config: 120 121 ```console 122 $ ais bucket props show BUCKET versioning 123 PROPERTY VALUE 124 versioning.enabled ... 125 versioning.validate_warm_get true 126 versioning.synchronize false ## <<<<<<<<<<<<<<<< note! 127 ``` 128 129 The knob called `versioning.synchronize` is simply a stronger variant of the `versioning.validate_warm_get`; 130 that entails both: 131 132 1. validating remote object version, and 133 2. deleting in-cluster object if its remote ("cached") counterpart does not exist. 134 135 To recap: 136 137 if an attempt to read remote metadata returns "object not found", and `versioning.synchronize` is set to `true`, then 138 we go ahead and delete the object locally, thus effectively _synchronizing_ in-cluster content with it's remote source. 139 140 ## GET latest version 141 142 But sometimes, there may be a need to have a more fine-grained, operation level, control over this functionality. 143 144 AIS API supports that. In CLI, the corresponding option is called `--latest`. Let's see a brief example, where: 145 146 1. `s3:///abc` is a bucket that contains 147 2. `s3://abc/README.md` object that was previously 148 3. out-of-band updated 149 150 In other words, the setup we describe boils down to a single main point: 151 152 * aistore contains a different version of an object (in this example: `s3://abc/README.md`). 153 154 Namely: 155 156 ```console 157 $ aws s3api list-object-versions --bucket abc --prefix README.md --max-keys 1 158 { 159 "Name": "abc", 160 "KeyMarker": "", 161 "MaxKeys": 1, 162 "IsTruncated": true, 163 "NextVersionIdMarker": "KJOQsGcR3qBX5WvXbwiB.2LAQW12opbQ", 164 ... 165 "Versions": [ 166 { 167 "IsLatest": true, 168 ... 169 } 170 ], 171 "Prefix": "README.md" 172 } 173 ``` 174 175 AIS, on the other hand, shows: 176 177 178 ```console 179 $ ais show object s3://abc/README.md --props version 180 PROPERTY VALUE 181 version 1yNHzpfd9Y16nDS71V5scjTMfbRZUPJI 182 ``` 183 184 Moreover, GET operation with default parameters doesn't help: 185 186 ```console 187 $ ais get s3://abc/README.md /dev/null 188 GET (and discard) README.md from s3://abc (13.82KiB) 189 190 $ ais show object s3://abc/README.md --props version 191 PROPERTY VALUE 192 version 1yNHzggpfd9Y16nDS71V5scjTMfbRZUPJI 193 ``` 194 195 To reconcile, we employ the `--latest` option: 196 197 ```console 198 $ ais get s3://abc/README.md /dev/null --latest 199 GET (and discard) README.md from s3://abc (13.82KiB) 200 201 $ ais show object s3://abc/README.md --props version 202 PROPERTY VALUE 203 version KJOQsGcR3qBX5WvXbwiB.2LAQW12opbQ 204 ``` 205 206 Notice that we now have the latest `KJOQsGc...` version (that `s3api` also calls `VersionIdMarker`). 207 208 ## References 209 210 * [`ais cp` command](/docs/cli/bucket.md) and, in particular, its `--sync` option. 211 - [Example copying buckets and multi-objects with simultaneous synchronization](/docs/cli/bucket.md#example-copying-buckets-and-multi-objects-with-simultaneous-synchronization)