github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2022-07-20-python-sdk.md (about) 1 --- 2 layout: post 3 title: "Python SDK: Getting Started" 4 date: Jul 20, 2022 5 author: Ryan Koo 6 categories: aistore python sdk 7 --- 8 9 # Python SDK: Getting Started 10 11 Python has grounded itself as a popular language of choice among data scientists and machine learning developers. Python's recent popularity in the field can be attributed to Python's general *ease-of-use*, especially with the popular machine learning framework [PyTorch](https://pytorch.org/), which is itself written in Python. 12 13 [AIStore Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore) is a project which includes a growing library of client-side APIs to easily access and utilize AIStore clusters, objects, and buckets, as well as a number of tools for AIStore usage/integration with PyTorch. 14 15 The [AIStore Python API](https://aiatscale.org/docs/python-api) is essentially a Python port of AIStore's [Go APIs](https://github.com/NVIDIA/aistore/tree/main/api). In terms of functionality, the AIStore Python and Go APIs are quite similar, both of which essentially make simple [HTTP requests](https://aiatscale.org/docs/http-api#api-reference) to an AIStore endpoint. The HTTP requests allow the APIs to interact (reads and writes) with an AIStore instance's metadata. The API provides convenient and flexible ways (similar to those provided by the [CLI](https://aiatscale.org/docs/cli)) to move data (as objects) in and out of buckets on AIStore, manage AIStore clusters, and much more. 16 17 This technical blog will demonstrate a few potential ways the Python API provided in the [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore) could be used with a running AIStore instance to manage and utilize data. 18 19 ## Getting Started 20 21 ### Installing & Deploying AIStore 22 23 The latest AIStore release can be easily installed either with Anaconda or `pip`: 24 25 ```console 26 $ conda install aistore 27 ``` 28 29 ```console 30 $ pip install aistore 31 ``` 32 33 > Note that only Python 3.x (version 3.6 or later) is currently supported for AIStore. 34 35 While there are a number of options available for deploying AIStore - as is demonstrated [here](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md) - for the sake of simplicity, we will be using AIStore's [minimal standalone docker deployment](https://github.com/NVIDIA/aistore/blob/main/deploy/prod/docker/single/README.md): 36 37 ```console 38 # Deploying the AIStore cluster in a container on port 51080 39 docker run -d \ 40 -p 51080:51080 \ 41 -v /disk0:/ais/disk0 \ 42 aistore/cluster-minimal:latest 43 ``` 44 45 ### Moving Data To AIStore 46 47 Let's say we want to move a copy of the [TinyImageNet](https://paperswithcode.com/dataset/tiny-imagenet) dataset from our local filesystem to a bucket on our running instance of AIStore. 48 49 First, we import the Python API and initialize the client to the running instance of AIStore: 50 51 ```python 52 from aistore import Client 53 54 client = Client("http://localhost:51080") 55 ``` 56 57 Before moving any data into AIStore, we can first check to see AIStore is fully deployed and ready: 58 59 ```python 60 client.cluster().is_aistore_running() 61 ``` 62 63 Once AIStore is verified as running, moving the dataset to a bucket on AIStore as a *compressed* format is as easy as: 64 65 ```python 66 BUCKET_NAME = "tinyimagenet_compressed" 67 COMRPESSED_TINYIMAGENET = "~/Datasets/tinyimagenet-compressed.zip" 68 OBJECT_NAME = "tinyimagenet-compressed.zip" 69 70 # Create a new bucket [BUCKET_NAME] to store dataset 71 client.bucket(BUCKET_NAME).create() 72 73 # Verify bucket creation operation 74 client.cluster().list_buckets() 75 76 # Put dataset [COMPRESSED_TINYIMAGENET] in bucket [BUCKET_NAME] as object with name [OBJECT_NAME] 77 client.bucket(BUCKET_NAME).object(OBJECT_NAME).put(COMPRESSED_TINYIMAGENET) 78 79 # Verify object put operation 80 client.bucket(BUCKET_NAME).list_objects().get_entries() 81 ``` 82 83 Say we now want to instead move an *uncompressed* version of TinyImageNet to AIStore. The uncompressed format of TinyImageNet is comprised of several sub-directories which divide the dataset's many image samples into separate sets (train, validation, test) as well as separate classes (based on numbers mapped to image labels). 84 85 As opposed to traditional file storage systems which operate on the concept of multi-level directories and sub-directories, object storage systems, such as AIStore, maintain a *strict* two-level hierarchy of *buckets* and *objects*. However, we can still maintain a "symbolic" directory by manipulating how we name the data. 86 87 We can move the dataset to an AIStore bucket while preserving the directory-based structure of the dataset by using the bucket `put_files` command along with the `recursive` option: 88 89 ```python 90 BUCKET_NAME = "tinyimagenet_uncompressed" 91 TINYIMAGENET_DIR = <local-path-to-dataset> + "/tinyimagenet/" 92 93 # Create a new bucket [BUCKET_NAME] to store dataset 94 bucket = client.bucket(BUCKET_NAME).create() 95 96 bucket.put_files(TINYIMAGENET_DIR, recursive=True) 97 98 # Verify object put operations 99 bucket.list_objects().get_entries() 100 ``` 101 102 ### Getting Data From AIStore 103 104 Getting the *compressed* TinyImageNet dataset from AIStore bucket `ais://tinyimagenet_compressed` is as easy as: 105 106 ```python 107 BUCKET_NAME = "tinyimagenet_compressed" 108 OBJECT_NAME = "tinyimagenet-compressed.zip" 109 110 # Get object [OBJECT_NAME] from bucket [BUCKET_NAME] 111 client.bucket(BUCKET_NAME).object(OBJECT_NAME).get() 112 ``` 113 114 If we want to get the *uncompressed* TinyImageNet from AIStore bucket `ais://tinyimagenet_uncompressed`, we can easily do that with [Bucket.list_objects()](https://aiatscale.org/docs/python-api#bucket.Bucket.list_objects) and [Object.get()](https://aiatscale.org/docs/python-api#object.Object.get). 115 116 ```python 117 BUCKET_NAME = "tinyimagenet_uncompressed" 118 119 # List all objects in bucket [BUCKET_NAME] 120 TINYIMAGENET_UNCOMPRESSED = client.bucket(BUCKET_NAME).list_objects().get_entries() 121 122 for FILENAME in TINYIMAGENET_UNCOMPRESSED: 123 # Get object [filename.name] from bucket [BUCKET_NAME] 124 client.bucket(BUCKET_NAME).object(FILENAME.name).get() 125 ``` 126 127 We can also pick a *specific* section of the uncompressed dataset and only get those specific objects. By specifying a `prefix` to our [Bucket.list_objects()](https://aiatscale.org/docs/python-api#bucket.Bucket.list_objects) call, we can manipulate the *symbolic* file system and list only the contents in our desired directory. 128 129 ```python 130 BUCKET_NAME = "tinyimagenet_uncompressed" 131 132 # Listing only objects with prefix "validation/" bucket [tinyimagenet_uncompressed] 133 TINYIMAGENET_UNCOMPRESSED_VAL = client.bucket(BUCKET_NAME).list_objects(prefix="validation/").get_entries() 134 135 for FILENAME in TINYIMAGENET_UNCOMPRESSED_VAL: 136 # Get operation on objects with prefix "validation/" from bucket [tinyimagenet_uncompressed] 137 client.bucket(BUCKET_NAME).object(FILENAME.name).get() 138 ``` 139 140 ### External Cloud Storage Providers 141 142 AIStore also supports third-party remote backends, including Amazon S3, Google Cloud, and Microsoft Azure. 143 144 > For exact definitions and related capabilities, please see [terminology](https://aiatscale.org//docs/overview#terminology). 145 146 We shutdown the previous instance of AIStore and re-deploy AIStore with AWS S3 and GCP backends attached: 147 148 ```console 149 # Similarly deploying AIStore cluster in a container on port 51080, but with GCP and AWS backends attached 150 docker run -d \ 151 -p 51080:51080 \ 152 -v <path_to_gcp_config>.json:/credentials/gcp.json \ 153 -e GOOGLE_APPLICATION_CREDENTIALS="/credentials/gcp.json" \ 154 -e AWS_ACCESS_KEY_ID="AWSKEYIDEXAMPLE" \ 155 -e AWS_SECRET_ACCESS_KEY="AWSSECRETEACCESSKEYEXAMPLE" \ 156 -e AWS_REGION="us-east-2" \ 157 -e AIS_BACKEND_PROVIDERS="gcp aws" \ 158 -v /disk0:/ais/disk0 \ 159 aistore/cluster-minimal:latest 160 ``` 161 162 > Deploying an AIStore cluster with third-party cloud backends simply *imports/copies the buckets and objects from the provided third-party backends to AIStore*. The client-side APIs themselves do **not** interact with the actual external backends at any point. The client-side APIs only interact with duplicate instances of those external cloud storage buckets residing in the AIStore cluster. 163 164 The [Object.get()](https://aiatscale.org/docs/python-api#object.Object.get) works with external cloud storage buckets as well. We can use the method in a similar fashion as shown previously to easily get either a compressed or uncompressed version of the dataset from, for examples, `gcp://tinyimagenet_compressed` and `gcp://tinyimagenet_uncompressed`. 165 166 ```python 167 # Getting compressed TinyImageNet dataset from [gcp://tinyimagenet_compressed] 168 BUCKET_NAME = "tinyimagenet_compressed" 169 OBJECT_NAME = "tinyimagenet-compressed.zip" 170 client.bucket(BUCKET_NAME, provider="gcp").object(OBJECT_NAME).get() 171 172 173 # Getting uncompressed TinyImageNet dataset from [gcp://tinyimagenet_uncompressed] 174 BUCKET_NAME = "tinyimagenet_uncompressed" 175 TINYIMAGENET_UNCOMPRESSED = client.bucket(BUCKET_NAME, provider="gcp").list_objects().get_entries() 176 for FILENAME in TINYIMAGENET_UNCOMPRESSED: 177 client.bucket(BUCKET_NAME, provider="gcp").object(FILENAME.name).get() 178 179 180 # Getting only objects with prefix "validation/" from bucket [gcp://tinyimagenet_uncompressed] 181 TINYIMAGENET_UNCOMPRESSED_VAL = client.bucket(BUCKET_NAME).list_objects(prefix="validation/").get_entries() 182 for FILENAME in TINYIMAGENET_UNCOMPRESSED_VAL: 183 client.bucket(BUCKET_NAME).object(FILENAME.name).get() 184 ``` 185 186 > Note the added argument `provider` supplied in [`Client.bucket()`](https://aiatscale.org/docs/python-api#api.Client.bucket) for the examples shown above. 187 188 We can instead choose to *copy* the contents of an external cloud storage bucket on AIStore to a native (AISProvider) AIStore bucket with [`Bucket.copy()`](https://aiatscale.org/docs/python-api#bucket.Bucket.copy) as well: 189 190 ```python 191 # Copy bucket [gcp://tinyimagenet_uncompressed] and its objects to new bucket [ais://tinyimagetnet_validationset] 192 FROM_BUCKET = "tinyimagenet_uncompressed" 193 TO_BUCKET = "tinyimagenet_validationset" 194 client.bucket(FROM_BUCKET, provider="gcp").copy(TO_BUCKET) 195 196 # Evict external cloud storage bucket [gcp://tinyimagenet_uncompressed] if not needed anymore for cleanup (free space on cluster) 197 client.bucket(FROM_BUCKET, provider="gcp").evict() 198 ``` 199 200 Eviction of a cloud storage bucket destroys any instance of the cloud storage bucket (and its objects) from the AIStore cluster metadata. Eviction does **not** delete or affect the actual cloud storage bucket (in AWS S3, GCP, or Azure). 201 202 203 ## PyTorch 204 205 PyTorch provides built-in [tools](https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#aistore-io-datapipe) for AIStore integration, allowing machine learning developers to easily use AIStore as a viable storage system option with PyTorch. In fact, the dataloading classes [`AISFileLister`](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#aisfilelister) and [`AISFileLoader`](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader) found in [`aisio.py`](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/load/aisio.py) provided by PyTorch make use of several of the client-side APIs referenced in this article. 206 207 For more information on dataloading from AIStore with PyTorch, please refer to this [article](https://aiatscale.org/blog/2022/07/12/aisio-pytorch). 208 209 210 ## More Examples & Resources 211 212 For more examples, please refer to additional documentation [AIStore Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore) and try out the [SDK tutorial (Jupyter Notebook)](https://github.com/NVIDIA/aistore/blob/main/python/aistore/sdk-tutorial.ipynb). 213 214 For information on specific API usage, please refer to the [API reference](https://aiatscale.org/docs/python-api). 215 216 217 ## References 218 219 * [AIStore GitHub](https://github.com/NVIDIA/aistore) 220 * [AIStore Go API](https://github.com/NVIDIA/aistore/tree/main/api) 221 * [AIStore Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore) 222 * [Documentation](https://aiatscale.org/docs) 223 * [Official AIStore PIP Package](https://pypi.org/project/aistore/)