github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2021-10-22-ais-etl-2.md (about) 1 --- 2 layout: post 3 title: "AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2)" 4 date: Oct 22, 2021 5 author: Janusz Marcinkiewicz, Prashanth Dintyala, Alex Aizman 6 categories: aistore etl pytorch python 7 --- 8 9 The goal now is to deploy our first ETL and have AIStore run it on each storage node, harnessing the distributed power (and close to data - meaning, **fast**). For the problem statement, background and terms, please see the previous post: 10 11 * [AIStore & ETL: Introduction](https://aiatscale.org/blog/2021/10/21/ais-etl-1) 12 13 To quickly get to the point, we'll assume that an instance of AIStore - minimally, a single [all-in-one docker container](https://aiatscale.org/docs/videos.md#minimal-all-in-one-standalone-docker) - has been already deployed on Kubernetes. 14 15 > Check out our dedicated [ais-k8s repository](https://github.com/NVIDIA/ais-k8s/) for the multiple easy ways to accomplish Kubernetes deployments. 16 17 We'll be using PyTorch's `torchvision` to transform [ImageNet dataset](https://www.image-net.org/) - as illustrated: 18 19 ![AIS-ETL Overview](/assets/ais_etl_series/ais-etl-overview.png) 20 21 Also, in the examples below you'll notice `ais` command. That's [AIStore’s CLI](https://aiatscale.org/docs/cli) tool providing unmatched (well, almost) convenience and ease-of-use. Most of the time, though, we'll show the equivalent `curl`. 22 23 > For a variety of practical reasons, `curl` proves to be handly in use cases - in `bash` and Python scripts on the client side. Big part of that is its (i.e., the `curl`'s) ubiquitous nature. Great tool, overall. 24 25 ## The Dataset 26 27 The dataset we have here is derived from the original ImageNet and is only slightly different. Its training part exists under a `train/` directory, validation - under `val/`. Each `*.jpg` image has a corresponding `*.cls` object with the corresponding class number. Image are assigned to one of the 1000 classes; classes are represented as integers in the `[0 - 999]` range: 28 29 ```console 30 $ ais ls ais://imagenet 31 NAME SIZE 32 train/0000353.cls 1B 33 train/0000353.jpg 17.88KiB 34 ... 35 val/1280048.cls 3B 36 val/1280048.jpg 82.78KiB 37 ``` 38 39 A random (non-transformed) image from the dataset (and again, notice `ais` [CLI](https://aiatscale.org/docs/cli) usage): 40 41 ```console 42 $ ais get ais://imagenet/train/0278350.jpg 43 $ open 0278350.jpg 44 ``` 45 46 ![example dog image](/assets/imagenet_pytorch_aistore/0278350.jpg) 47 48 And the associated *class*: 49 50 ```console 51 $ ais object cat ais://imagenet/train/0278350.cls 52 217 53 ``` 54 55 ## The Plan and the Code 56 57 The plan, essentially, is two-fold: 58 59 1. Deploy provided transformation code (called `code.py` below) as ETL K8s container aka *transformer*. 60 2. Drive *transformer* from the PyTorch-based client to transform requested objects (shards) as required. 61 62 In the end, each image from the dataset, before it reaches the model, goes through a series of the following (`code.py`) transformations: 63 64 ```python 65 # `code.py`: 66 import io, sys 67 import torch 68 from PIL import Image 69 from torchvision import transforms 70 71 def img_to_bytes(img): 72 buf = io.BytesIO() 73 img = img.convert('RGB') 74 img.save(buf, format='JPEG') 75 return buf.getvalue() 76 77 preprocessing = transforms.Compose([ 78 transforms.RandomResizedCrop(224), 79 transforms.RandomHorizontalFlip(), 80 transforms.ToTensor(), 81 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 82 transforms.ToPILImage(), 83 transforms.Lambda(img_to_bytes), 84 ]) 85 86 if __name__ == "__main__": 87 input_bytes = sys.stdin.buffer.read() 88 image = Image.open(io.BytesIO(input_bytes)).convert('RGB') 89 processed_bytes = preprocessing(image) 90 sys.stdout.buffer.write(processed_bytes) 91 ``` 92 93 ## Initializing 94 95 We will use `python3` (`python:3.8.5`) *runtime* to install `torch` and `torchvision` packages. 96 97 > [runtime](https://github.com/NVIDIA/ais-etl/tree/master/runtime) contains a predefined work environment in which the provided code/script will be run. We do support `python2` (`python:2.7.18`) and more runtimes are planned in the future. 98 99 To make sure that `code.py` (above) can have its imports, the following (`deps.txt`) dependencies must be installed: 100 101 ``` 102 torch==1.6.0 103 torchvision==0.7.0 104 ``` 105 106 With transforming code and dependencies covered, we are now fully ready to initialize ETL in the cluster: 107 108 ```console 109 $ ais etl init code \ 110 --name my-first-etl \ 111 --from-file code.py \ 112 --deps-file deps.txt \ 113 --runtime python3 \ 114 --comm-type io:// 115 ``` 116 117 Notice the "elements" of this `ais` command: 118 119 * user-given name of the specific (`code.py`) transformation; 120 * the dependencies; 121 * the aforementioned `runtime`, and, finally - 122 * `--comm-type` ("communication type") option briefly already [mentioned](https://aiatscale.org/blog/2021/10/21/ais-etl-1) - and we'll discuss it in-depth in our future postings. 123 124 Two ways to check that ETL is up and running: `ais` CLI ("way") and `kubectl`: 125 126 ``` 127 $ ais etl list 128 NAME 129 my-first-etl 130 131 $ kubectl -n ais get pods | grep ‘my-first-etl’ 132 ais my-first-etl-iacjhrvc 1/1 Running 0 50s 133 ``` 134 135 Recap what we just did: 136 137 1. We prepared Python3 code (`code.py`) and provided dependencies (`deps.txt`) to run it. 138 2. We started transformer in the AIS cluster to, subsequently, augment images from the ImageNet dataset. 139 140 ## Transforming a single object 141 142 To get a single object, we will use the [AIStore’s CLI](https://aiatscale.org/docs/cli), and we will show equivalent `curl` commands which can be useful in certain situations. 143 Those commands are convenient for quick testing and specific solutions which are written e.g. in `bash`. 144 145 The original image: 146 147 ```console 148 $ ais get ais://imagenet/train/0278350.jpg 0278350.jpg 149 150 # Equivalent `curl`: 151 $ curl -O 0278350.jpg https://aistore/v1/objects/imagenet/train/0278350.jpg 152 ``` 153 154 and, the image after the transformation: 155 156 ```console 157 $ ais etl object my-first-etl ais://imagenet/train/0278350.jpg 0278350.jpg 158 159 # Equivalent `curl`: 160 $ curl -O 0278350.jpg https://aistore/v1/objects/imagenet/train/0278350.jpg?uuid=my-first-etl 161 ``` 162 163 Notice that the only difference between these two `curl` (and, respectively, `ais`) commands above - is the `uuid` parameter referencing existing and available (`my-first-etl`) transformation that we have just [previously initialized](#initializing). 164 165 Post-transform `0278350.jpg` image: 166 167 ![example dog image transformed](/assets/imagenet_pytorch_aistore/0278350-transformed.jpg) 168 169 ### AIS/PyTorch connector 170 171 So far we have set up ETL and tried our first cluster-resident transformations. We can now start running a real training model. For this purpose, we have prepared a slightly modified version of the [PyTorch ImageNet example](https://github.com/pytorch/examples/tree/master/imagenet) that can be found [here](https://gist.github.com/VirrageS/7e2c80635e0efae3e63b5e3d5d2aaaf6). The script contains training and validation code for the ImageNet dataset. 172 173 Next step is to modify the script to utilize `my-first-etl` transformer. 174 175 The typical code for loading ImageNet from a local directory looks like this: 176 177 ```python 178 normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 179 180 train_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 181 os.path.join(args.data, 'train'), 182 transforms.Compose([ 183 transforms.RandomResizedCrop(224), 184 transforms.RandomHorizontalFlip(), 185 transforms.ToTensor(), 186 normalize, 187 ]), 188 ), 189 batch_size=args.batch_size, shuffle=True, 190 num_workers=args.workers, pin_memory=True) 191 192 val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 193 os.path.join(args.data, 'val'), 194 transforms.Compose([ 195 transforms.Resize(256), 196 transforms.CenterCrop(224), 197 transforms.ToTensor(), 198 normalize, 199 ]), 200 ), 201 batch_size=args.batch_size, shuffle=False, 202 num_workers=args.workers, pin_memory=True) 203 ``` 204 205 > Full code for the example above is also available - see [ImageNet PyTorch training with `dataset.ImageFolder`](/examples/etl-imagenet-dataset/train_pytorch.py). 206 207 In the world of PyTorch, all datasets are subclasses of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) with [`torchvision.datasets.ImageFolder`](https://pytorch.org/vision/stable/datasets.html) being the standard for handling datasets with labeled images. 208 209 To integrate with PyTorch and *offload* transformations to AIStore, we introduce `aistore.pytorch.Dataset` - the implementation of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset). 210 211 With `aistore.pytorch.Dataset`, the example above works out as follows: 212 213 ```python 214 import aistore 215 from aistore.client import Bck 216 217 ... 218 219 train_loader = torch.utils.data.DataLoader( 220 aistore.pytorch.Dataset( 221 "http://aistore-sample-proxy:51080", # AIS IP address or hostname 222 Bck("imagenet"), 223 prefix="train/", transform_id="my-first-etl", 224 transform_filter=lambda object_name: object_name.endswith('.jpg'), 225 ), 226 batch_size=args.batch_size, shuffle=True, 227 num_workers=args.workers, pin_memory=True) 228 229 val_loader = torch.utils.data.DataLoader( 230 aistore.pytorch.Dataset( 231 "http://aistore-sample-proxy:51080", # AIS IP address or hostname 232 Bck("imagenet"), 233 prefix="val/", transform_id="my-second-etl", # We skipped setting up this ETL. 234 transform_filter=lambda object_name: object_name.endswith('.jpg'), 235 ), 236 batch_size=args.batch_size, shuffle=False, 237 num_workers=args.workers, pin_memory=True) 238 ``` 239 240 Complete code is available here: 241 242 * [ImageNet PyTorch training with `aistore.pytorch.Dataset`](/examples/etl-imagenet-dataset/train_aistore.py) 243 244 ## References 245 246 1. [AIStore & ETL: Introduction](https://aiatscale.org/blog/2021/10/21/ais-etl-1) 247 2. GitHub: 248 - [AIStore](https://github.com/NVIDIA/aistore) 249 - [AIS/Kubernetes Operator, AIS on bare-metal, Deployment Playbooks, Helm](https://github.com/NVIDIA/ais-k8s) 250 - [AIS-ETL containers and specs](https://github.com/NVIDIA/ais-etl) 251 2. Documentation, blogs, videos: 252 - https://aiatscale.org 253 - https://github.com/NVIDIA/aistore/tree/main/docs 254 255 PS. Note that we have omitted setting-up ETL for the validation loader - leaving it as an exercise for the reader. To be continued...