github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2023-04-10-tco-any-to-any.md

github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2023-04-10-tco-any-to-any.md (about)

     1  ---
     2  layout: post
     3  title:  "Transforming non-existing datasets"
     4  date:   Apr 10, 2023
     5  author: Alex Aizman
     6  categories: aistore performance etl
     7  ---
     8  
     9  There's an old trick that never quite gets old: you run a high-velocity exercise that generates a massive amount of traffic through some sort of a multi-part system, whereby some of those parts are (spectacularly) getting killed and periodically recovered.
    10  
    11  TL;DR a simple demonstration that does exactly that (and see detailed comments inside):
    12  
    13  | Script | Action |
    14  | --- | --- |
    15  | [cp-rmnode-rebalance](https://github.com/NVIDIA/aistore/blob/main/ais/test/scripts/cp-rmnode-rebalance.sh) | taking a random node to maintenance when there's no data redundancy |
    16  | [cp-rmnode-ec](https://github.com/NVIDIA/aistore/blob/main/ais/test/scripts/cp-rmnode-ec.sh) | (erasure coded content) + (immediate loss of a node) |
    17  | [cp-rmdisk](https://github.com/NVIDIA/aistore/blob/main/ais/test/scripts/cp-rmdisk.sh) | (3-way replication) + (immediate loss of a random drive) |
    18  
    19  > The scripts are self-contained and will run with any aistore instance that has at least 5 nodes, each with 3+ disks.
    20  
    21  ![show performance and running jobs](/assets/tco-any-to-any/show-perf-job.png)
    22  
    23  But when the traffic is running and the parts are getting periodically killed and recovered in a variety of realistic ways - then you would maybe want to watch it via [Prometheus](https://aiatscale.org/docs/prometheus) or Graphite/Grafana. Or, at the very least, via 'ais show performance' - the poor man's choice that's always available.
    24  
    25  > `ais show performance --help` for details
    26  
    27  Observability notwithstanding, the idea is always the same - to see whether the combined throughput dips at any point (it does). And by how much, how long (it depends).
    28  
    29  There's one (and only one) problem though: vanilla copying may sound dull and mundane. Frankly, it is totally unexciting, even when coincided with all the rebalancing/rebuilding runtime drama behind the scenes.
    30  
    31  ## Copy
    32  
    33  And so, to make it marginally more interesting - but also to increase usability - we go ahead and copy a non-existing dataset. Something like:
    34  
    35  ```console
    36  $ ais ls s3
    37  No "s3://" matching buckets in the cluster. Use '--all' option to list _all_ buckets.
    38  
    39  $ ais storage summary s3://src --all
    40  NAME             OBJECTS (cached, remote)
    41  s3://src                  0       1430
    42  
    43  $ ais ls gs
    44  No "gs://" matching buckets in the cluster. Use '--all' option to list _all_ buckets.
    45  
    46  $ ais cp s3://src gs://dst --progress --refresh 3 --all
    47  
    48  Copied objects:              277/1430 [===========>--------------------------------------------------] 19 %
    49  Copied size:    277.00 KiB / 1.40 MiB [===========>--------------------------------------------------] 19 %
    50  ```
    51  
    52  The first three commands briefly establish non-existence - the fact that there are no Amazon and Google buckets in the cluster _right now_.
    53  
    54  > `ais storage summary` command (and its close relative `ais ls --summary`) will also report whether the source is visible/accessible and will conveniently compute numbers and sizes (not shown).
    55  
    56  But because "existence" may come with all sorts of connotations the term is: [presence](https://aiatscale.org/blog/2022/11/13/relnotes-3.12). We say "present" or "not present" in reference to remote buckets and/or data in those buckets, whereby the latter may or may not be currently present in part or in whole.
    57  
    58  In this case, both the source and the destination (`s3://src` and `gs://dst`, respectively) were ostensibly not present, and we just went ahead to run the copy with a progress bar and a variety of not shown list/range/prefix selections and options (see `--help` for details).
    59  
    60  ## Transform
    61  
    62  From here on, the immediate and fully expected question is: _transformation_. Namely - whether it'd be possible to transform datasets - not just copy but also apply a user-defined transformation to the source that _may_ be (currently) stored in the AIS cluster, or maybe not or not entirely.
    63  
    64  Something like:
    65  
    66  ```console
    67  $ ais etl init spec --name=my-custom-transform --from-file=my-custom-transform.yaml
    68  ```
    69  
    70  followed by:
    71  
    72  ```console
    73  $ ais etl bucket my-custom-transform s3://src gs://dst --progress --refresh 3 --all
    74  ```
    75  
    76  The first step deploys user containers on each clustered node. More precisely, the `init-spec` API call is broadcast to each target node; in response, each node calls K8s API to pull the corresponding image and run it locally and in parallel - but only if the container in question is not already previously deployed. (And yes, ETL is the only aistore feature that does require Kubernetes.)
    77  
    78  > Another flavor of `ais etl init` command is `ais etl init code` - see `--help` for details.
    79  
    80  That was the first step - the second is virtually identical to copying (see previous section). It'll read remote dataset from Amazon S3, transform it, and place the result into another (e.g., Google) cloud.
    81  
    82  > As a quick aside, anything that aistore reads or writes remotely aistore also stores. _Storing_ is always done in full accordance with the configured redundancy and other applicable bucket policies and - secondly - all subsequent access to the same content (that previously was remote) gets _terminated_ inside the cluster.
    83  
    84  ## Despite node and drive failures
    85  
    86  The [scripts](https://github.com/NVIDIA/aistore/tree/main/ais/test/scripts) above periodically fail and recover nodes and disks. But we could also go ahead and replace `ais cp` command with its `ais etl` counterpart - that is, replace dataset replication with dataset (offline) transformation, while leaving everything else intact.
    87  
    88  We could do even more - select any _startable_ job:
    89  
    90  ```console
    91  $ ais start <TAB-TAB>
    92  prefetch           dsort              etl                cleanup            mirror             warm-up-metadata   move-bck
    93  download           lru                rebalance          resilver           ec-encode          copy-bck
    94  ```
    95  
    96  and run it while simultaneously taking out nodes and disks. It'll run and, given enough redundancy in the system, it'll recover and will keep going.
    97  
    98  **NOTE:**
    99  
   100  The ability to recover is much more fundamental than any specific [job kind](https://github.com/NVIDIA/aistore/blob/main/xact/api.go#L108-L230) that's already supported today or will be added in the future.
   101  
   102  > Not every job is _startable_. In fact, majority of the supported jobs have their own dedicated API and CLI, and there are still other jobs that run only on demand.
   103  
   104  ## The Upshot
   105  
   106  The beauty of copying is in the eye of the beholder. But personally, big part of it is that there's no need to have a client. Not that clients are bad, I'm not saying that (in fact, the opposite may be true). But there's a certain elegance and power in running self-contained jobs that are autonomously driven by the cluster and execute at (N * disk-bandwidth) aggregated throughput, where N is the total number of clustered disks.
   107  
   108  At the core of it, there's the (core) process whereby all nodes, in parallel, run reading and writing threads on a per (local) disk basis, each reading thread traversing local, or soon-to-be local, part of the source dataset. Whether it'd be vanilla copying or user-defined offline transformation on steroids, the underlying iterative picture is always the same:
   109  
   110  1. read the next object using a built-in (local or remote) or etl container-provided _reader_
   111  2. write it using built-in (local or remote), or container-provided _writer_
   112  3. repeat
   113  
   114  Parallelism and autonomy always go hand in hand. In aistore, _location_ rules are cluster-wide universal. Given identical (versioned, protected, and replicated) cluster map and its own disposition of local disks, each node independently decides _what_ to read and _where_ to write it. There's no stepping-over, no duplication, and no conflicts.
   115  
   116  > Question to maybe take offline: how to do the "nexting" when the source is remote (i.e., not _present_)? How to iterate a remote source without loss of parallelism?
   117  
   118  And so, even though it ultimately boils down to iteratively calling read and write primitives, the core process appears to be infinitely flexible in its applications.
   119  
   120  And that's the upshot.
   121  
   122  ## References
   123  
   124  * [Lifecycle management: maintenance mode, rebalance/rebuild, and more](/docs/lifecycle_node.md)