github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/README.md (about)

     1  **AIStore is a lightweight object storage system with the capability to linearly scale out with each added storage node and a special focus on petascale deep learning.**
     2  
     3  ![License](https://img.shields.io/badge/license-MIT-blue.svg)
     4  ![Go Report Card](https://goreportcard.com/badge/github.com/NVIDIA/aistore)
     5  
     6  AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size.
     7  
     8  AIS [consistently shows balanced I/O distribution and linear scalability](https://aiatscale.org/blog/2024/02/16/multihome-bench) across arbitrary numbers of clustered nodes. The ability to scale linearly with each added disk was, and remains, one of the main incentives. Much of the initial design was also driven by the ideas to [offload](https://aiatscale.org/blog/2023/06/09/aisio-transforms-with-webdataset-pt-3) custom dataset transformations (often referred to as [ETL](https://aiatscale.org/blog/2021/10/21/ais-etl-1)). And finally, since AIS is a software system that aggregates Linux machines to provide storage for user data, there's the requirement number one: reliability and data protection.
     9  
    10  ## Features
    11  
    12  * **Deploys anywhere**. AIS clusters are immediately deployable on any commodity hardware, on any Linux machine(s).
    13  * **Highly available** control and data planes, end-to-end data protection, self-healing, n-way mirroring, erasure coding, and arbitrary number of extremely lightweight access points.
    14  * **REST API**. Comprehensive native HTTP-based API, as well as compliant [Amazon S3 API](/docs/s3compat.md) to run unmodified S3 clients and apps.
    15  * **Unified namespace** across multiple [remote backends](/docs/providers.md) including Amazon S3, Google Cloud, and Microsoft Azure.
    16  * **Network of clusters**. Any AIS cluster can attach any other AIS cluster, thus gaining immediate visibility and fast access to the respective hosted datasets.
    17  * **Turn-key cache**. Can be used as a standalone highly-available protected storage and/or LRU-based fast cache. Eviction watermarks, as well as numerous other management policies, are per-bucket configurable.
    18  * **ETL offload**. The capability to run I/O intensive custom data transformations *close to data* - offline (dataset to dataset) and inline (on-the-fly).
    19  * **File datasets**. AIS can be immediately populated from any file-based data source (local or remote, ad-hoc/on-demand or via asynchronus batch).
    20  * **Read-after-write consistency**. Reading and writing (as well as all other control and data plane operations) can be performed via any (random, selected, or load-balanced) AIS gateway (a.k.a. "proxy"). Once the first replica of an object is written and _finalized_ subsequent reads are guaranteed to view the same content. Additional copies and/or EC slices, if configured, are added asynchronously via `put-copies` and `ec-put` jobs, respectively.
    21  * **Write-through**. In presence of any [remote backend](/docs/providers.md), AIS executes remote write (e.g., using vendor's SDK) as part of the [transaction](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#read-after-write-consistency) that places and _finalizes_ the first replica.
    22  * **Small file datasets.** To serialize small files and facilitate batch processing, AIS supports TAR, TAR.GZ (or TGZ), ZIP, and TAR.LZ4 formatted objects (often called _shards_). Resharding (for optimal sorting and sizing), listing contained files (samples), appending to existing shards, and generating new ones from existing objects and/or client-side files - is also fully supported.
    23  * **Kubernetes**. Provides for easy Kubernetes deployment via a separate GitHub [repo](https://github.com/NVIDIA/ais-k8s) and [AIS/K8s Operator](https://github.com/NVIDIA/ais-k8s/tree/master/operator).
    24  * **Access control**. For security and fine-grained access control, AIS includes OAuth 2.0 compliant [Authentication Server (AuthN)](/docs/authn.md). A single AuthN instance executes CLI requests over HTTPS and can serve multiple clusters.
    25  * **Distributed shuffle** extension for massively parallel resharding of very large datasets.
    26  * **Batch jobs**. APIs and CLI to start, stop, and monitor documented [batch operations](/docs/batch.md), such as `prefetch`, `download`, copy or transform datasets, and many more.
    27  
    28  For easy usage, management, and monitoring, there's also:
    29  * **Integrated and powerful [CLI](/docs/cli.md)**. As of early 2024, top-level CLI commands include:
    30  ```console
    31  $ ais
    32  
    33  bucket        etl         help           log              create        dsort        stop         blob-download
    34  object        job         advanced       performance      download      evict        cp           rmo
    35  cluster       auth        storage        remote-cluster   prefetch      get          rmb          wait
    36  config        show        archive        alias            put           ls           start        search
    37  ```
    38  
    39  AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux `tar(1)`, `scp(1)`, `rsync(1)` and similar.
    40  
    41  For developers and data scientists, there's also:
    42  * native [Go (language) API](https://github.com/NVIDIA/aistore/tree/main/api) that we utilize in a variety of tools including [CLI](/docs/cli.md) and [Load Generator](/docs/aisloader.md);
    43  * native [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore/sdk)
    44    - [Python SDK reference guide](/docs/python_sdk.md)
    45  * [PyTorch integration](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch) and usage examples
    46  * [Boto3 support](https://github.com/NVIDIA/aistore/tree/main/python/aistore/botocore_patch) for interoperability with AWS SDK for Python (aka [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)) client
    47    - and other [Botocore](https://github.com/boto/botocorehttps://github.com/boto/botocore) derivatives.
    48  
    49  For the original AIStore **white paper** and design philosophy, for introduction to large-scale deep learning and the most recently added features, please see [AIStore Overview](/docs/overview.md) (where you can also find six alternative ways to work with existing datasets). Videos and **animated presentations** can be found at [videos](/docs/videos.md).
    50  
    51  Finally, [getting started](/docs/getting_started.md) with AIS takes only a few minutes.
    52  
    53  ---------------------
    54  
    55  ## Deployment options
    56  
    57  AIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all [summarized here](deploy/README.md).
    58  
    59  Since prerequisites boil down to, essentially, having Linux with a disk the deployment options range from [all-in-one container](/docs/videos.md#minimal-all-in-one-standalone-docker) to a petascale bare-metal cluster of any size, and from a single VM to multiple racks of high-end servers. But practical use cases require, of course, further consideration and may include:
    60  
    61  | Option | Objective |
    62  | --- | ---|
    63  | [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers and development, Linux or Mac OS |
    64  | Minimal production-ready deployment | This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets) |
    65  | [Easy automated GCP/GKE deployment](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#kubernetes-deployments) | Developers, first-time users, AI researchers |
    66  | [Large-scale production deployment](https://github.com/NVIDIA/ais-k8s) | Requires Kubernetes and is provided via a separate repository: [ais-k8s](https://github.com/NVIDIA/ais-k8s) |
    67  
    68  Further, there's the capability referred to as [global namespace](https://github.com/NVIDIA/aistore/blob/main/docs/providers.md#remote-ais-cluster): given HTTP(S) connectivity, AIS clusters can be easily interconnected to "see" each other's datasets. Hence, the idea to start "small" to gradually and incrementally build high-performance shared capacity.
    69  
    70  > For detailed discussion on supported deployments, please refer to [Getting Started](/docs/getting_started.md).
    71  
    72  > For performance tuning and preparing AIS nodes for bare-metal deployment, see [performance](/docs/performance.md).
    73  
    74  ## Existing datasets
    75  
    76  AIStore supports multiple ways to populate itself with existing datasets, including (but not limited to):
    77  
    78  * **on demand**, often during the first epoch;
    79  * **copy** entire bucket or its selected virtual subdirectories;
    80  * **copy** multiple matching objects;
    81  * **archive** multiple objects
    82  * **prefetch** remote bucket or parts of thereof;
    83  * **download** raw http(s) addressible directories, including (but not limited to) Cloud storages;
    84  * **promote** NFS or SMB shares accessible by one or multiple (or all) AIS target nodes;
    85  
    86  > The on-demand "way" is maybe the most popular, whereby users just start running their workloads against a [remote bucket](docs/providers.md) with AIS cluster positioned as an intermediate fast tier.
    87  
    88  But there's more. In v3.22, we introduce [blob downloader](docs/blob_downloader.md), a special facility to download very large remote objects (BLOBs).
    89  
    90  ## Installing from release binaries
    91  
    92  Generally, AIStore (cluster) requires at least some sort of [deployment](/deploy#contents) procedure. There are standalone binaries, though, that can be [built](Makefile) from source or, alternatively, installed directly from GitHub:
    93  
    94  ```console
    95  $ ./scripts/install_from_binaries.sh --help
    96  ```
    97  
    98  The script installs [aisloader](/docs/aisloader.md) and [CLI](/docs/cli.md) from the most recent, or the previous, GitHub [release](https://github.com/NVIDIA/aistore/releases). For CLI, it'll also enable auto-completions (which is strongly recommended).
    99  
   100  ## PyTorch integration
   101  
   102  AIS is one of the PyTorch [Iterable Datapipes](https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#iterable-datapipes).
   103  
   104  Specifically, [TorchData](https://github.com/pytorch/data) library provides:
   105  * [AISFileLister](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#aisfilelister)
   106  * [AISFileLoader](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#aisfileloader)
   107  
   108  to list and, respectively, load data from AIStore.
   109  
   110  Further references and usage examples - in our technical blog at https://aiatscale.org/blog:
   111  * [PyTorch: Loading Data from AIStore](https://aiatscale.org/blog/2022/07/12/aisio-pytorch)
   112  * [Python SDK: Getting Started](https://aiatscale.org/blog/2022/07/20/python-sdk)
   113  
   114  Since AIS natively supports a number of [remote backends](/docs/providers.md), you can also use (PyTorch + AIS) to iterate over Amazon S3 and Google Cloud buckets, and more.
   115  
   116  ## Reuse
   117  
   118  This repo includes [SGL and Slab allocator](/memsys) intended to optimize memory usage, [Streams and Stream Bundles](/transport) to multiplex messages over long-lived HTTP connections, and a few other sub-packages providing rather generic functionality.
   119  
   120  With a little effort, they all could be extracted and used outside.
   121  
   122  ## Guides and References
   123  
   124  - [Getting Started](/docs/getting_started.md)
   125  - [Technical Blog](https://aiatscale.org/blog)
   126  - API and SDK
   127    - [Go (language) API](https://github.com/NVIDIA/aistore/tree/main/api)
   128    - [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore), and also:
   129      - [pip package](https://pypi.org/project/aistore/)
   130      - [reference guide](/docs/python_sdk.md)
   131    - [REST API](/docs/http_api.md)
   132      - [Easy URL](/docs/easy_url.md)
   133  - Amazon S3
   134    - [`s3cmd` client](/docs/s3cmd.md)
   135    - [S3 compatibility](/docs/s3compat.md)
   136    - [Presigned S3 requests](/docs/s3compat.md#presigned-s3-requests)
   137    - [Boto3 support](https://github.com/NVIDIA/aistore/tree/main/python/aistore/botocore_patch)
   138  - [CLI](/docs/cli.md)
   139    - [`ais help`](/docs/cli/help.md)
   140    - [Reference guide](https://github.com/NVIDIA/aistore/blob/main/docs/cli.md#cli-reference)
   141    - [Monitoring](/docs/cli/show.md)
   142      - [`ais show cluster`](/docs/cli/show.md)
   143      - [`ais show performance`](/docs/cli/show.md)
   144      - [`ais show job`](/docs/cli/show.md)
   145    - [Cluster and node management](/docs/cli/cluster.md)
   146    - [Mountpath (disk) management](/docs/cli/storage.md)
   147    - [Attach, detach, and monitor remote clusters](/docs/cli/cluster.md)
   148    - [Start, stop, and monitor downloads](/docs/cli/download.md)
   149    - [Distributed shuffle](/docs/cli/dsort.md)
   150    - [User account and access management](/docs/cli/auth.md)
   151    - [Jobs](/docs/cli/job.md)
   152  - Security and Access Control
   153    - [Authentication Server (AuthN)](/docs/authn.md)
   154  - Tutorials
   155    - [Tutorials](/docs/tutorials/README.md)
   156    - [Videos](/docs/videos.md)
   157  - Power tools and extensions
   158    - [Reading, writing, and listing *archives*](/docs/archive.md)
   159    - [Distributed Shuffle](/docs/dsort.md)
   160    - [Downloader](/docs/downloader.md)
   161    - [Extract, Transform, Load](/docs/etl.md)
   162    - [Tools and utilities](/docs/tools.md)
   163  - Benchmarking and tuning Performance
   164    - [AIS Load Generator: integrated benchmark tool](/docs/aisloader.md)
   165    - [How to benchmark](/docs/howto_benchmark.md)
   166    - [Performance tuning and testing](/docs/performance.md)
   167    - [Performance monitoring](/docs/cli/performance.md)
   168  - Buckets and Backend Providers
   169    - [Backend providers](/docs/providers.md)
   170    - [Buckets](/docs/bucket.md)
   171  - Storage Services
   172    - [CLI: `ais show storage` and subcommands](/docs/cli/show.md)
   173    - [CLI: `ais storage` and subcommands](/docs/cli/storage.md)
   174    - [Storage Services](/docs/storage_svcs.md)
   175    - [Checksumming: brief theory of operations](/docs/checksum.md)
   176    - [S3 compatibility](/docs/s3compat.md)
   177  - Cluster Management
   178    - [Node lifecycle: maintenance mode, rebalance/rebuild, shutdown, decommission](/docs/lifecycle_node.md)
   179    - [Monitoring: `ais show` and subcommands](/docs/cli/show.md)
   180    - [Joining AIS cluster](/docs/join_cluster.md)
   181    - [Leaving AIS cluster](/docs/leave_cluster.md)
   182    - [Global Rebalance](/docs/rebalance.md)
   183    - [Troubleshooting](/docs/troubleshooting.md)
   184  - Configuration
   185    - [Configuration](/docs/configuration.md)
   186    - [Environment variables](/docs/environment-vars.md)
   187    - [CLI: `ais config`](/docs/cli/config.md)
   188    - [Feature flags](/docs/feature_flags.md)
   189  - Observability
   190    - [Observability](/docs/metrics.md)
   191    - [Prometheus](/docs/prometheus.md)
   192    - [CLI: `ais show performance`](/docs/cli/show.md)
   193  - For users and developers
   194    - [Getting started](/docs/getting_started.md)
   195    - [Docker](/docs/docker_main.md)
   196    - [Useful scripts](/docs/development.md)
   197    - Profiling, race-detecting, and more
   198  - Batch jobs
   199    - [Batch operations](/docs/batch.md)
   200    - [eXtended Actions (xactions)](/xact/README.md)
   201    - [CLI: `ais job`](/docs/cli/job.md) and [`ais show job`](/docs/cli/show.md), including:
   202      - [prefetch remote datasets](/docs/cli/object.md#prefetch-objects)
   203      - [copy bucket](/docs/cli/bucket.md#copy-bucket)
   204      - [copy multiple objects](/docs/cli/bucket.md#copy-multiple-objects)
   205      - [download remote BLOBs](/docs/cli/blob-downloader.md)
   206      - [promote NFS or SMB share](https://aiatscale.org/blog/2022/03/17/promote), and more
   207  - Assorted Topics
   208    - [System files](/docs/sysfiles.md)
   209    - [Switching cluster between HTTP and HTTPS](/docs/switch_https.md)
   210    - [TLS: testing with self-signed certificates](/docs/getting_started.md#tls-testing-with-self-signed-certificates)
   211    - [Feature flags](/docs/feature_flags.md)
   212    - [`aisnode` command line](/docs/command_line.md)
   213    - [Traffic patterns](/docs/traffic_patterns.md)
   214    - [Highly available control plane](/docs/ha.md)
   215    - [Start/stop maintenance mode, shutdown, decommission, and related operations](/docs/lifecycle_node.md)
   216    - [Downloader](/docs/downloader.md)
   217    - [On-disk layout](/docs/on_disk_layout.md)
   218    - [Buckets: definition, operations, properties](https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#bucket)
   219    - [Out-of-band updates](/docs/out_of_band.md)
   220  
   221  ## License
   222  
   223  MIT
   224  
   225  ## Author
   226  
   227  Alex Aizman (NVIDIA)