github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/cli/etl.md (about) 1 --- 2 layout: post 3 title: ETL 4 permalink: /docs/cli/etl 5 redirect_from: 6 - /cli/etl.md/ 7 - /docs/cli/etl.md/ 8 --- 9 10 # CLI Reference for ETLs 11 12 This section documents ETL management operations with `ais etl`. But first, note: 13 14 > As with [global rebalance](/docs/rebalance.md), [dSort](/docs/dsort.md), and [download](/docs/download.md), all ETL management commands can be also executed via `ais job` and `ais show` - the commands that, by definition, support all AIS *xactions*, including AIS-ETL 15 16 For background on AIS-ETL, getting-started steps, working examples, and tutorials, please refer to: 17 18 * [ETL documentation](/docs/etl.md) 19 20 ## Table of Contents 21 22 - [Init ETL with spec](#init-etl-with-spec) 23 - [Init ELT with code](#init-etl-with-code) 24 - [List ETLs](#list-etls) 25 - [View ETL Logs](#view-etl-logs) 26 - [Stop ETL](#stop-etl) 27 - [Transform object on-the-fly with given ETL](#transform-object-on-the-fly-with-given-etl) 28 - [Transform a bucket offline with the given ETL](#transform-a-bucket-offline-with-the-given-etl) 29 30 ## Init ETL with spec 31 32 `ais etl init spec --from-file=SPEC_FILE --name=ETL_NAME [--comm-type=COMMUNICATION_TYPE] [--wait-timeout=TIMEOUT] [--arg-type=ARGUMENT_TYPE]` or `ais start etl init` 33 34 Init ETL with Pod YAML specification file. The `--name` parameter is used to assign a user defined unique name to the ETL (ref: [here](/docs/etl.md#etl-name-specifications) for information on valid ETL name). 35 36 ### Example 37 38 Initialize ETL that computes MD5 of the object. 39 40 ```console 41 $ cat spec.yaml 42 apiVersion: v1 43 kind: Pod 44 metadata: 45 name: transformer-md5 46 spec: 47 containers: 48 - name: server 49 image: aistore/transformer_md5:latest 50 ports: 51 - name: default 52 containerPort: 80 53 command: ['/code/server.py', '--listen', '0.0.0.0', '--port', '80'] 54 $ ais etl init spec --from-file=spec.yaml --name=transformer-md5 --comm-type=hpull:// --wait-timeout=1m 55 transformer-md5 56 ``` 57 58 ## Init ETL with code 59 60 `ais etl init code --name=ETL_NAME --from-file=CODE_FILE --runtime=RUNTIME [--chunk-size=NUM_OF_BYTES] [--transform=TRANSFORM_FUNC] [--before=BEFORE_FUNC] [--after=AFTER_FUNC] [--deps-file=DEPS_FILE] [--comm-type=COMMUNICATION_TYPE] [--wait-timeout=TIMEOUT] [--arg-type=ARGUMENT_TYPE]` 61 62 Initializes ETL from provided `CODE_FILE` that contains a transformation function named `transform(input_bytes)` or `transform(input_bytes, context)`, an optional function executed prior to the transform function named `before(context)` which is supposed to initialize all the variables needed for the `transform(input_bytes, context)` and optional post transform function named `after(context)` which consolidates the results and returns to the user the transformed `output_bytes`. 63 64 The `--name` parameter is used to assign a user defined unique name to the ETL (ref: [here](/docs/etl.md#etl-name-specifications) for information on valid ETL name). 65 66 Based on the communication type used, there are mutiple ways you can initialize the `transform(input_bytes, context)`, `before(context)` and `after(context)` functions. Check [ETL Init Code Docs](docs/etl.md#init-code-request) for more info. 67 68 All available runtimes are listed [here](/docs/etl.md#runtimes). 69 70 Note: 71 - Default value of --transform is "transform". 72 73 ### Example 74 75 Initialize ETL with code that computes MD5 of the object. 76 77 ```console 78 $ cat code.py 79 import hashlib 80 81 def transform(input_bytes): 82 md5 = hashlib.md5() 83 md5.update(input_bytes) 84 return md5.hexdigest().encode() 85 86 $ ais etl init code --from-file=code.py --runtime=python3.11v2 --name=transformer-md5 --comm-type hpull 87 88 transformer-md5 89 ``` 90 91 With `before(context)` and `after(context)` function with streaming (`CHUNK_SIZE` > 0): 92 ```console 93 $ cat code.py 94 import hashlib 95 def before(context): 96 context["before"] = hashlib.md5() 97 return context 98 99 def transform(input_bytes, context): 100 context["before"].update(input_bytes) 101 102 def after(context): 103 return context["before"].hexdigest().encode() 104 105 $ ais etl init code --name=etl-md5 --from-file=code.py --runtime=python3.11v2 --chunk-size=32768 --before=before --after=after --comm-type hpull 106 ``` 107 108 ## List ETLs 109 110 `ais etl show` or, same, `ais job show etl` 111 112 Lists all available ETLs. 113 114 ## View ETL Logs 115 116 `ais etl view-logs ETL_NAME [TARGET_ID]` 117 118 Output logs produced by given ETL. 119 It is possible to pass an additional parameter to specify a particular `TARGET_ID` from which the logs must be retrieved. 120 121 ## Stop ETL 122 123 `ais etl stop ETL_NAME` or, same, `ais stop etl` 124 125 Stop ETL with the specified id. 126 127 128 ## Start ETL 129 130 `ais etl start ETL_NAME` or, same, `ais start etl` 131 132 Start ETL with the specified id. 133 134 135 ## Transform object on-the-fly with given ETL 136 137 `ais etl object ETL_NAME BUCKET/OBJECT_NAME OUTPUT` 138 139 Get object with ETL defined by `ETL_NAME`. 140 141 ### Examples 142 143 #### Transform object to STDOUT 144 145 Does ETL on `shards/shard-0.tar` object with `transformer-md5` ETL (computes MD5 of the object) and print the output to the STDOUT. 146 147 ```console 148 $ ais etl object transformer-md5 ais://shards/shard-0.tar - 149 393c6706efb128fbc442d3f7d084a426 150 ``` 151 152 #### Transform object to output file 153 154 Do ETL on the `shards/shard-0.tar` object with `transformer-md5` ETL (computes MD5 of the object) and save the output to the `output.txt` file. 155 156 ```console 157 $ ais etl object transformer-md5 ais://shards/shard-0.tar output.txt 158 $ cat output.txt 159 393c6706efb128fbc442d3f7d084a426 160 ``` 161 162 ## Transform a bucket offline with the given ETL 163 164 `ais etl bucket ETL_NAME SRC_BUCKET DST_BUCKET` 165 166 Transform all or selected objects and put them into another bucket. 167 168 | Flag | Type | Description | 169 | --- | --- | --- | 170 | `--list` | `string` | Comma-separated list of object names, e.g., 'obj1,obj2' | 171 | `--template` | `string` | Template for matching object names, e.g, 'obj-{000..100}.tar' | 172 | `--ext` | `string` | Mapping from old to new extensions of transformed objects, e.g. {jpg:txt}, "{ in1 : out1, in2 : out2 }"| 173 | `--prefix` | `string` | Prefix added to every new object name | 174 | `--wait` | `bool` | Wait until operation is finished | 175 | `--requests-timeout` | `duration` | Timeout for a single object transformation | 176 | `--dry-run` | `bool` | Don't actually transform the bucket, only display what would happen | 177 178 Flags `--list` and `--template` are mutually exclusive. If neither of them is set, the command transforms the whole bucket. 179 180 ### Examples 181 182 #### Transform bucket with ETL 183 184 Transform every object from `src_bucket` with ETL and put new objects to `dst_bucket`. 185 186 ```console 187 $ ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket 188 MMi9l8Z11 189 $ ais wait xaction MMi9l8Z11 190 ``` 191 192 #### Transform bucket with ETL 193 194 The same as above, but wait for the ETL bucket to finish. 195 196 ```console 197 $ ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --wait 198 ``` 199 200 #### Transform selected objects in bucket with ETL 201 202 Transform objects `shard-10.tar`, `shard-11.tar`, and `shard-12.tar` from `src_bucket` with ETL and put new objects to `dst_bucket`. 203 204 ```console 205 $ ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --template "shard-{10..12}.tar" 206 ``` 207 208 #### Transform bucket with ETL and additional parameters 209 210 The same as above, but objects will have `etl-` prefix and objects with extension `.in1` will have `.out1` extension, objects with extension `.in2` will have `.out2` extension. 211 212 ```console 213 $ ais ls ais://src_bucket --props=name 214 NAME 215 obj1.in1 216 obj2.in2 217 (...) 218 $ ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --ext="{in1:out1, in2:out2}" --prefix="etl-" --wait 219 $ ais ls ais://dst_bucket --props=name 220 NAME 221 etl-obj1.out1 222 etl-obj2.out2 223 (...) 224 ``` 225 226 #### Transform bucket with ETL but with dry-run 227 228 Dry-run won't perform any actions but rather just show what would be transformed if we actually transformed a bucket. 229 This is useful for preparing the actual run. 230 231 ```console 232 $ ais ls ais://src_bucket --props=name,size 233 NAME SIZE 234 obj1.in1 10MiB 235 obj2.in2 10MiB 236 (...) 237 $ ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --dry-run --wait 238 [DRY RUN] No modifications on the cluster 239 2 objects (20MiB) would have been put into bucket ais://dst_bucket 240 ```