github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/out_of_band.md (about)

     1  ---
     2  layout: post
     3  title: Out-of-band updates
     4  permalink: /docs/out_of_band
     5  redirect_from:
     6   - /out_of_band.md/
     7   - /docs/out_of_band.md/
     8  ---
     9  
    10  ## Out-of-band updates
    11  
    12  One (but not the only one) way to deal with out-of-band updates is to configure bucket as follows:
    13  
    14  ```console
    15  $ ais bucket props set s3://abc versioning.validate_warm_get true
    16  "versioning.validate_warm_get" set to: "true" (was: "false")
    17  ```
    18  
    19  Here, `s3://abc` is presumably an Amazon S3 bucket, but it could be any Cloud or remote AIS bucket.
    20  
    21  > It could also be any `ais://` bucket with Cloud or remote AIS backend. For usage,  see `backend_bck` option in CLI documentation and examples.
    22  
    23  Once `validate_warm_get` is set, **any read** operation on the bucket will take a bit of extra time to compare the in-cluster metadata with its remote counterpart.
    24  
    25  Further, if and when this comparison fails, aistore performs a _cold_ GET, to create a new copy of the remote object and make sure that the cluster has the latest version.
    26  
    27  Needless to say, the latest version will be always returned to the user as well.
    28  
    29  ## Lesser scope
    30  
    31  But sometimes, we may want to perform a single given operation without updating bucket configuration. For instance:
    32  
    33  ```console
    34  $ ais prefetch s3://abc --latest
    35  
    36  prefetch-objects[f70MKzP63]: prefetch entire bucket s3://abc. To monitor the progress, run 'ais show job f70MKzP63'
    37  ```
    38  
    39  Notice the `--latest` switch above. As far as this particular `prefetch` is concerned `--latest` will have the same effect as setting `versioning.validate_warm_get=true`. But only "as far" - the scope of validating in-cluster versions will be limited to this specific batch job.
    40  
    41  The same applies to copying buckets, [copying ranges and lists of objects](/docs/cli/bucket.md#copy-multiple-objects), and certainly getting (as in `GET`) individual objects.
    42  
    43  Here's the an excerpt from `GET` help (and note `--latest` below):
    44  
    45  ```console
    46  $ ais get --help
    47  
    48  USAGE:
    49     ais get [command options] BUCKET[/OBJECT_NAME] [OUT_FILE|OUT_DIR|-]
    50  
    51  OPTIONS:
    52     --offset value    object read offset; must be used together with '--length'; default formatting: IEC (use '--units' to override)
    53     --length value    object read length; default formatting: IEC (use '--units' to override)
    54     --checksum        validate checksum
    55     --yes, -y         assume 'yes' to all questions
    56     --check-cached    instead of GET execute HEAD(object) to check if the object is present in aistore
    57                       (applies only to buckets with remote backend)
    58     --latest          GET, prefetch, or copy the latest object version from the associated remote bucket;
    59                       allows operation-level control over object version synchronization _without_ changing bucket configuration
    60                       (the latter can be done using 'ais bucket props set BUCKET versioning')
    61  ...
    62  ```
    63  
    64  ### See also
    65  
    66  * [`ais cp` command](/docs/cli/bucket.md) and, in particular, its `--sync` option.
    67  - [Example copying buckets and multi-objects with simultaneous synchronization](/docs/cli/bucket.md#example-copying-buckets-and-multi-objects-with-simultaneous-synchronization)
    68  
    69  ## Out-of-band writes, deletes, and more
    70  
    71  1. with version validation enabled, aistore will detect both out-of-band writes and deletes;
    72  2. buckets with versioning disabled are also supported;
    73  3. decision on whether to perform cold-GET is made upon comparing remote and local metadata;
    74  4. the latter always includes object size, but also may include any combination of:
    75     - `version`
    76     - `ETag`
    77     - `ais object checksum` (by default, xxhash that we store as part of custom Cloud metadata)
    78     - `MD5`
    79     - `CRC32C`
    80  
    81  To enable version validation, run:
    82  
    83  ```console
    84  $ ais bucket props set BUCKET versioning.validate_warm_get true
    85  
    86  ## optionally:
    87  
    88  $ ais bucket props show BUCKET versioning
    89  PROPERTY                         VALUE
    90  versioning.enabled               ...
    91  versioning.validate_warm_get     true
    92  versioning.synchronize           false
    93  ```
    94  
    95  No assumption is being made on whether any of the above is present (except, of course, the size aka "Content-Length").
    96  
    97  The rules are simple:
    98  
    99  * compare _existing_ items of the same kind (`size` vs `size`, `MD5` and `MD5`, etc.);
   100  * fail immediately - that is, require cold GET - if any pair of comparable items differ;
   101  * count all matches except `size` (in other words, same size does _not_ contribute to decision in favor of skipping cold GET);
   102  * exclude double counting (which is mostly relevant for `ETag` vs `MD5`);
   103  * require **two or more matches**.
   104  
   105  When there are no matches, we go ahead with cold GET.
   106  
   107  A single match - e.g. only the `version` (if exists), or only `ETag`, etc. - is currently resolved positively iff the source backend is the same as well.
   108  
   109  > E.g., copying object from Amazon to Google and then performing validated GET with aistore backend "pointing" to Google - will fail the match.
   110  
   111  > TODO: make it configurable to require at least two matches.
   112  
   113  Needless to say, if querying remote metadata fails the corresponding GET transaction will fail as well.
   114  
   115  ## When reading in-cluster data causes deletion
   116  
   117  But there's one special condition when the call to query remote metadata returns "object not found". In other words, when the remote backend unambiguously indicates that the remote object does not exist (any longer).
   118  
   119  In this case, there are two configurable choices as per (already shown) `versioning` section of the bucket config:
   120  
   121  ```console
   122  $ ais bucket props show BUCKET versioning
   123  PROPERTY                         VALUE
   124  versioning.enabled               ...
   125  versioning.validate_warm_get     true
   126  versioning.synchronize           false  ## <<<<<<<<<<<<<<<< note!
   127  ```
   128  
   129  The knob called `versioning.synchronize` is simply a stronger variant of the `versioning.validate_warm_get`;
   130  that entails both:
   131  
   132  1. validating remote object version, and
   133  2. deleting in-cluster object if its remote ("cached") counterpart does not exist.
   134  
   135  To recap:
   136  
   137  if an attempt to read remote metadata returns "object not found", and `versioning.synchronize` is set to `true`, then
   138  we go ahead and delete the object locally, thus effectively _synchronizing_ in-cluster content with it's remote source.
   139  
   140  ## GET latest version
   141  
   142  But sometimes, there may be a need to have a more fine-grained, operation level, control over this functionality.
   143  
   144  AIS API supports that. In CLI, the corresponding option is called `--latest`. Let's see a brief example, where:
   145  
   146  1. `s3:///abc` is a bucket that contains
   147  2. `s3://abc/README.md` object that was previously
   148  3. out-of-band updated
   149  
   150  In other words, the setup we describe boils down to a single main point:
   151  
   152  * aistore contains a different version of an object (in this example: `s3://abc/README.md`).
   153  
   154  Namely:
   155  
   156  ```console
   157  $ aws s3api list-object-versions --bucket abc --prefix README.md --max-keys 1
   158  {
   159      "Name": "abc",
   160      "KeyMarker": "",
   161      "MaxKeys": 1,
   162      "IsTruncated": true,
   163      "NextVersionIdMarker": "KJOQsGcR3qBX5WvXbwiB.2LAQW12opbQ",
   164  ...
   165      "Versions": [
   166          {
   167              "IsLatest": true,
   168  ...
   169          }
   170      ],
   171      "Prefix": "README.md"
   172  }
   173  ```
   174  
   175  AIS, on the other hand, shows:
   176  
   177  
   178  ```console
   179  $ ais show object s3://abc/README.md --props version
   180  PROPERTY         VALUE
   181  version          1yNHzpfd9Y16nDS71V5scjTMfbRZUPJI
   182  ```
   183  
   184  Moreover, GET operation with default parameters doesn't help:
   185  
   186  ```console
   187  $ ais get s3://abc/README.md /dev/null
   188  GET (and discard) README.md from s3://abc (13.82KiB)
   189  
   190  $ ais show object s3://abc/README.md --props version
   191  PROPERTY         VALUE
   192  version          1yNHzggpfd9Y16nDS71V5scjTMfbRZUPJI
   193  ```
   194  
   195  To reconcile, we employ the `--latest` option:
   196  
   197  ```console
   198  $ ais get s3://abc/README.md /dev/null --latest
   199  GET (and discard) README.md from s3://abc (13.82KiB)
   200  
   201  $ ais show object s3://abc/README.md --props version
   202  PROPERTY         VALUE
   203  version          KJOQsGcR3qBX5WvXbwiB.2LAQW12opbQ
   204  ```
   205  
   206  Notice that we now have the latest `KJOQsGc...` version (that `s3api` also calls `VersionIdMarker`).
   207  
   208  ## References
   209  
   210  * [`ais cp` command](/docs/cli/bucket.md) and, in particular, its `--sync` option.
   211  - [Example copying buckets and multi-objects with simultaneous synchronization](/docs/cli/bucket.md#example-copying-buckets-and-multi-objects-with-simultaneous-synchronization)