github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2021-10-22-ais-etl-2.md (about)

     1  ---
     2  layout: post
     3  title:  "AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2)"
     4  date:   Oct 22, 2021
     5  author: Janusz Marcinkiewicz, Prashanth Dintyala, Alex Aizman
     6  categories: aistore etl pytorch python
     7  ---
     8  
     9  The goal now is to deploy our first ETL and have AIStore run it on each storage node, harnessing the distributed power (and close to data - meaning, **fast**). For the problem statement, background and terms, please see the previous post:
    10  
    11  * [AIStore & ETL: Introduction](https://aiatscale.org/blog/2021/10/21/ais-etl-1)
    12  
    13  To quickly get to the point, we'll assume that an instance of AIStore - minimally, a single [all-in-one docker container](https://aiatscale.org/docs/videos.md#minimal-all-in-one-standalone-docker) - has been already deployed on Kubernetes.
    14  
    15  > Check out our dedicated [ais-k8s repository](https://github.com/NVIDIA/ais-k8s/) for the multiple easy ways to accomplish Kubernetes deployments.
    16  
    17  We'll be using PyTorch's `torchvision` to transform [ImageNet dataset](https://www.image-net.org/) - as illustrated:
    18  
    19  ![AIS-ETL Overview](/assets/ais_etl_series/ais-etl-overview.png)
    20  
    21  Also, in the examples below you'll notice `ais` command. That's [AIStore’s CLI](https://aiatscale.org/docs/cli) tool providing unmatched (well, almost) convenience and ease-of-use. Most of the time, though, we'll show the equivalent `curl`.
    22  
    23  > For a variety of practical reasons, `curl` proves to be handly in use cases - in `bash` and Python scripts on the client side. Big part of that is its (i.e., the `curl`'s) ubiquitous nature. Great tool, overall.
    24  
    25  ## The Dataset
    26  
    27  The dataset we have here is derived from the original ImageNet and is only slightly different. Its training part exists under a `train/` directory, validation - under `val/`. Each `*.jpg` image has a corresponding `*.cls` object with the corresponding class number. Image are assigned to one of the 1000 classes; classes are represented as integers in the `[0 - 999]` range:
    28  
    29  ```console
    30  $ ais ls ais://imagenet
    31  NAME             SIZE
    32  train/0000353.cls     1B
    33  train/0000353.jpg     17.88KiB
    34  ...
    35  val/1280048.cls       3B
    36  val/1280048.jpg       82.78KiB
    37  ```
    38  
    39  A random (non-transformed) image from the dataset (and again, notice `ais` [CLI](https://aiatscale.org/docs/cli) usage):
    40  
    41  ```console
    42  $ ais get ais://imagenet/train/0278350.jpg
    43  $ open 0278350.jpg
    44  ```
    45  
    46  ![example dog image](/assets/imagenet_pytorch_aistore/0278350.jpg)
    47  
    48  And the associated *class*:
    49  
    50  ```console
    51  $ ais object cat ais://imagenet/train/0278350.cls
    52  217
    53  ```
    54  
    55  ## The Plan and the Code
    56  
    57  The plan, essentially, is two-fold:
    58  
    59  1. Deploy provided transformation code (called `code.py` below) as ETL K8s container aka *transformer*.
    60  2. Drive *transformer* from the PyTorch-based client to transform requested objects (shards) as required.
    61  
    62  In the end, each image from the dataset, before it reaches the model, goes through a series of the following (`code.py`) transformations:
    63  
    64  ```python
    65  # `code.py`:
    66  import io, sys
    67  import torch
    68  from PIL import Image
    69  from torchvision import transforms
    70  
    71  def img_to_bytes(img):
    72      buf = io.BytesIO()
    73      img = img.convert('RGB')
    74      img.save(buf, format='JPEG')
    75      return buf.getvalue()
    76  
    77  preprocessing = transforms.Compose([
    78      transforms.RandomResizedCrop(224),
    79      transforms.RandomHorizontalFlip(),
    80      transforms.ToTensor(),
    81      transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    82      transforms.ToPILImage(),
    83      transforms.Lambda(img_to_bytes),
    84  ])
    85  
    86  if __name__ == "__main__":
    87      input_bytes = sys.stdin.buffer.read()
    88      image = Image.open(io.BytesIO(input_bytes)).convert('RGB')
    89      processed_bytes = preprocessing(image)
    90      sys.stdout.buffer.write(processed_bytes)
    91  ```
    92  
    93  ## Initializing
    94  
    95  We will use `python3` (`python:3.8.5`) *runtime* to install `torch` and `torchvision` packages.
    96  
    97  > [runtime](https://github.com/NVIDIA/ais-etl/tree/master/runtime) contains a predefined work environment in which the provided code/script will be run.  We do support `python2` (`python:2.7.18`) and more runtimes are planned in the future.
    98  
    99  To make sure that `code.py` (above) can have its imports, the following (`deps.txt`) dependencies must be installed:
   100  
   101  ```
   102  torch==1.6.0
   103  torchvision==0.7.0
   104  ```
   105  
   106  With transforming code and dependencies covered, we are now fully ready to initialize ETL in the cluster:
   107  
   108  ```console
   109  $ ais etl init code \
   110    --name my-first-etl \
   111    --from-file code.py \
   112    --deps-file deps.txt \
   113    --runtime python3 \
   114    --comm-type io://
   115  ```
   116  
   117  Notice the "elements" of this `ais` command:
   118  
   119  * user-given name of the specific (`code.py`) transformation;
   120  * the dependencies;
   121  * the aforementioned `runtime`, and, finally -
   122  * `--comm-type` ("communication type") option briefly already [mentioned](https://aiatscale.org/blog/2021/10/21/ais-etl-1) - and we'll discuss it in-depth in our future postings.
   123  
   124  Two ways to check that ETL is up and running: `ais` CLI ("way") and `kubectl`:
   125  
   126  ```
   127  $ ais etl list
   128  NAME
   129  my-first-etl
   130  
   131  $ kubectl -n ais get pods | grep ‘my-first-etl’
   132  ais   my-first-etl-iacjhrvc                            1/1     Running   0          50s
   133  ```
   134  
   135  Recap what we just did:
   136  
   137  1. We prepared Python3 code (`code.py`) and provided dependencies (`deps.txt`) to run it.
   138  2. We started transformer in the AIS cluster to, subsequently, augment images from the ImageNet dataset.
   139  
   140  ## Transforming a single object
   141  
   142  To get a single object, we will use the [AIStore’s CLI](https://aiatscale.org/docs/cli), and we will show equivalent `curl` commands which can be useful in certain situations.
   143  Those commands are convenient for quick testing and specific solutions which are written e.g. in `bash`.
   144  
   145  The original image:
   146  
   147  ```console
   148  $ ais get ais://imagenet/train/0278350.jpg 0278350.jpg
   149  
   150  # Equivalent `curl`:
   151  $ curl -O 0278350.jpg https://aistore/v1/objects/imagenet/train/0278350.jpg
   152  ```
   153  
   154  and, the image after the transformation:
   155  
   156  ```console
   157  $ ais etl object my-first-etl ais://imagenet/train/0278350.jpg 0278350.jpg
   158  
   159  # Equivalent `curl`:
   160  $ curl -O 0278350.jpg https://aistore/v1/objects/imagenet/train/0278350.jpg?uuid=my-first-etl
   161  ```
   162  
   163  Notice that the only difference between these two `curl` (and, respectively, `ais`) commands above - is the `uuid` parameter referencing existing and available (`my-first-etl`) transformation that we have just [previously initialized](#initializing).
   164  
   165  Post-transform `0278350.jpg` image:
   166  
   167  ![example dog image transformed](/assets/imagenet_pytorch_aistore/0278350-transformed.jpg)
   168  
   169  ### AIS/PyTorch connector
   170  
   171  So far we have set up ETL and tried our first cluster-resident transformations. We can now start running a real training model. For this purpose, we have prepared a slightly modified version of the [PyTorch ImageNet example](https://github.com/pytorch/examples/tree/master/imagenet) that can be found [here](https://gist.github.com/VirrageS/7e2c80635e0efae3e63b5e3d5d2aaaf6). The script contains training and validation code for the ImageNet dataset.
   172  
   173  Next step is to modify the script to utilize `my-first-etl` transformer.
   174  
   175  The typical code for loading ImageNet from a local directory looks like this:
   176  
   177  ```python
   178  normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
   179  
   180  train_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
   181         os.path.join(args.data, 'train'),
   182         transforms.Compose([
   183             transforms.RandomResizedCrop(224),
   184             transforms.RandomHorizontalFlip(),
   185             transforms.ToTensor(),
   186             normalize,
   187         ]),
   188      ),
   189      batch_size=args.batch_size, shuffle=True,
   190      num_workers=args.workers, pin_memory=True)
   191  
   192  val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
   193         os.path.join(args.data, 'val'),
   194         transforms.Compose([
   195             transforms.Resize(256),
   196             transforms.CenterCrop(224),
   197             transforms.ToTensor(),
   198             normalize,
   199         ]),
   200      ),
   201      batch_size=args.batch_size, shuffle=False,
   202      num_workers=args.workers, pin_memory=True)
   203  ```
   204  
   205  > Full code for the example above is also available - see [ImageNet PyTorch training with `dataset.ImageFolder`](/examples/etl-imagenet-dataset/train_pytorch.py).
   206  
   207  In the world of PyTorch, all datasets are subclasses of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) with [`torchvision.datasets.ImageFolder`](https://pytorch.org/vision/stable/datasets.html) being the standard for handling datasets with labeled images.
   208  
   209  To integrate with PyTorch and *offload* transformations to AIStore, we introduce `aistore.pytorch.Dataset` - the implementation of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset).
   210  
   211  With `aistore.pytorch.Dataset`, the example above works out as follows:
   212  
   213  ```python
   214  import aistore
   215  from aistore.client import Bck
   216  
   217  ...
   218  
   219  train_loader = torch.utils.data.DataLoader(
   220      aistore.pytorch.Dataset(
   221          "http://aistore-sample-proxy:51080", # AIS IP address or hostname
   222  	Bck("imagenet"),
   223          prefix="train/", transform_id="my-first-etl",
   224          transform_filter=lambda object_name: object_name.endswith('.jpg'),
   225      ),
   226      batch_size=args.batch_size, shuffle=True,
   227      num_workers=args.workers, pin_memory=True)
   228  
   229  val_loader = torch.utils.data.DataLoader(
   230      aistore.pytorch.Dataset(
   231          "http://aistore-sample-proxy:51080", # AIS IP address or hostname
   232  	Bck("imagenet"),
   233          prefix="val/", transform_id="my-second-etl", # We skipped setting up this ETL.
   234          transform_filter=lambda object_name: object_name.endswith('.jpg'),
   235      ),
   236      batch_size=args.batch_size, shuffle=False,
   237      num_workers=args.workers, pin_memory=True)
   238  ```
   239  
   240  Complete code is available here:
   241  
   242  * [ImageNet PyTorch training with `aistore.pytorch.Dataset`](/examples/etl-imagenet-dataset/train_aistore.py)
   243  
   244  ## References
   245  
   246  1. [AIStore & ETL: Introduction](https://aiatscale.org/blog/2021/10/21/ais-etl-1)
   247  2. GitHub:
   248      - [AIStore](https://github.com/NVIDIA/aistore)
   249      - [AIS/Kubernetes Operator, AIS on bare-metal, Deployment Playbooks, Helm](https://github.com/NVIDIA/ais-k8s)
   250      - [AIS-ETL containers and specs](https://github.com/NVIDIA/ais-etl)
   251  2. Documentation, blogs, videos:
   252      - https://aiatscale.org
   253      - https://github.com/NVIDIA/aistore/tree/main/docs
   254  
   255  PS. Note that we have omitted setting-up ETL for the validation loader - leaving it as an exercise for the reader. To be continued...