github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2021-12-15-whats-new-in-v3.8.md (about) 1 --- 2 layout: post 3 title: "What's new in AIS v3.8" 4 date: Dec 15, 2021 5 author: Alex Aizman 6 categories: aistore 7 --- 8 9 AIStore v3.8 is a significant upgrade delivering [long-awaited features, stabilization fixes, and performance improvements](https://github.com/NVIDIA/aistore/releases/tag/3.8). There's also the cumulative effect of continuous functional and stress testing combined with (continuous) refactoring to optimize and reinforce the codebase. 10 11 In other words, a certain achieved *milestone* that includes: 12 13 ## ETL 14 15 AIS-ETL is designed around the idea to run custom *transforming* containers directly on AIS target nodes. Typical flow includes the following steps: 16 17 1. User initiates ETL workload by executing one of the documented API calls 18 and providing either the corresponding docker image or a *transforming function* (e.g. Python script); 19 2. AIS gateway coordinates the deployment of ETL containers (aka K8s pods) on AIS targets: one container per target; 20 3. Each target creates a local `communicator` instance for the specified `communication type`. 21 22 Prior to 3.8, [supported communication types](/docs/etl.md) were all HTTP-based. For instance, existing ["hpull://"](/docs/etl.md#communication-mechanisms) facilitates HTTP-redirect type communication with AIS target redirecting original read requests to the local ETL container. Version 3.8 adds a non-HTTP communicator (denoted as "io://") and removes the requirement to wrap your custom transforming logic into some sort of HTTP processing. 23 24 The new "io://" communicator acts as a simple executor of external commands *by* the ETL container. On its end, AIS target resorts to capturing resulting standard output (containing transformed bytes) and standard error. This is maybe not the most performant solution but certainly the easiest one to implement. 25 26 Additionally, v3.8 integrates ETL (jobs) with [xactions](/docs/batch.md) thus providing consistency in terms of starting/stopping and managing/monitoring. All existing APIs and [CLIs](/docs/cli/job.md) that are common for all [xactions](/docs/batch.md) are supported out of the box. 27 28 Finally, v3.8 introduces persistent ETL metadata as a new replicated-versioned-and-protected metadata type. The implementation leverages existing mechanism to keep clustered nodes in-sync with added, removed, and updated ETL specifications. The ultimate objective is to be able to run an arbitrary mix of inline and offline ETLs while simultaneously viewing and *editing* their (persistent) specs. 29 30 Further reading: 31 - [Using AIS/PyTorch connector to transform ImageNet](https://aiatscale.org/blog/2021/10/22/ais-etl-2) 32 - [Using WebDataset to train on a sharded dataset](https://aiatscale.org/blog/2021/10/29/ais-etl-3) 33 34 ## Storage cleanup 35 36 Cleanup, as the name implies, is tasked with safely removing already deleted objects (that we keep for a while to support future [undeletion](https://en.wikipedia.org/wiki/Undeletion)). Subject to being cleaned up also are: 37 38 * workfiles resulting from interrupted workloads 39 * unfinished erasure-coded slices 40 * misplaced replicas left behind during global rebalancing 41 42 and similar. In short, all sorts of "artifacts" of distributed migration, replication, and erasure coding. 43 44 Like LRU-based cluster-wide eviction, cleanup runs automatically or [administratively](/docs/cli/storage.md). Cleanup triggers automatically when the system exceeds 65% (or configured) of total used capacity. But note: 45 46 > Automatic cleanup always runs _prior_ to automatic LRU eviction, so that the latter would take into account updated used and available percentages. 47 48 > LRU eviction is separately configured on a per-bucket basis with cluster-wide inheritable defaults set as follows: enabled for Cloud buckets, disabled for AIS buckets that have no remote backend. 49 50 ## Custom object metadata 51 52 AIS now differentiates between: 53 54 * its own system metadata (size, access time, checksum, number of copies, etc.) 55 * Cloud object metadata (source, version, MD5, ETag), and 56 * custom metadata comprising user-defined key/values 57 58 All metadata from all sources is now preserved and checksum-protected, stored persistently and maintained across all intra-cluster migrations and replications. There's also an improved check for local <=> remote equality in the context of cold GETs and [downloads](/docs/downloader.md) - the check that takes into account size, version (if available), ETag (if available), and checksum(s) - all of the above. 59 60 ## Volume 61 62 Multi-disk volume in AIS is a collection of [mountpaths](/docs/overview.md#terminology). The corresponding metadata (called VMD) is versioned, persistent, and protected (i.e., checksummed and replicated). Version 3.8 reinforces ais volume (function) in presence of unlikely but nevertheless critical *scenarios* that include the usual: 63 64 * faulted drives, degraded drives, missing (unmounted or detached) drives 65 * old, missing, or corrupted VMD instances 66 67 At startup, AIS target performs mini-bootstrapping sequence to load and cross-check VMD against other its stored replicas and persistent configuration, both. At runtime, there's a revised, amended, and fully-supported capability to gracefully detach and attach mountpaths. 68 69 In fact, any mountpath can be temporarily disabled and (re)enabled, permanently detached and later re-attached. As long as there's enough space on the remaining mountpaths to carry out volume resilvering all the 4 (four) verbs can be used at any time. 70 71 > Needless to say, it'd make sense _not_ to power cycle the target during resilvering. 72 73 ## Easy URL 74 75 The feature codenamed "easy URL" is a simple alternative mapping of the AIS API to handle URLs paths that look as follows: 76 77 | URL Path | Cloud | 78 | --- | --- | 79 | /gs/mybucket/myobject | Google Cloud buckets | 80 | /az/mybucket/myobject | Azure Blob Storage | 81 | /ais/mybucket/myobject | AIS | 82 83 In other words, easy URL is a convenience that allows reading, writing, deleting, and listing as follows: 84 85 ```console 86 # Example: GET 87 $ curl -L -X GET 'http://aistore/gs/my-google-bucket/abc-train-0001.tar' 88 89 # Example: PUT 90 $ curl -L -X PUT 'http://aistore/gs/my-google-bucket/abc-train-9999.tar -T /tmp/9999.tar' 91 92 # Example: LIST 93 $ curl -L -X GET 'http://aistore/gs/my-google-bucket' 94 ``` 95 96 Note, however: 97 98 > There's a reason that Amazon S3 is missing in the list (above) that includes GCP and Azure. That's because AIS provides full [S3 compatibility](/docs/s3compat.md) layer via its "/s3" endpoint. [S3 compatibility](/docs/s3compat.md) shall not be confused with a simple alternative ("easy URL") mapping of HTTP requests. 99 100 101 ## TL;DR 102 103 Other v3.8 additions include: 104 105 - target *standby* mode !4688, !4689, !4691 106 - amended and improved performance monitoring !4792, !4793, !4794, !4798, !4800, !4810, !4812 107 - ais targets with no disks !4825 108 - Kubernetes Operator [v0.9](https://github.com/NVIDIA/ais-k8s/releases/tag/v0.9) 109 - and more. 110 111 Some of those might be described later in a separate posting.