github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/xact/README.md (about) 1 This is the top eXtended Action (`xaction`) directory containing much of the common functionality and interfaces used by the rest of the code. 2 3 In addition, it contains subdirectories: 4 5 * `xreg` - xaction registry 6 * `xs` - concrete named xactions, e.g. `apc.ActRebalance`, `apc.ActPromote`, `apc.ActSummaryBck` and other enumerated *kinds*. 7 8 > For all supported xactions, their *kinds* and static properties, see `xact.Table`. 9 10 > Xaction *kinds* are generally consistent with the API constants from `api/apc/const.go`. 11 12 ## Extended Actions (xactions) 13 14 Batch operations that may take many seconds (minutes, hours) to execute are called eXtended actions or *xactions*. 15 16 Xactions run asynchronously, have one of the enumerated kinds, start/stop times, and xaction-specific statistics. 17 Xactions start running based on a wide variety of runtime conditions that include: 18 19 * periodic (defined by a configured interval of time) 20 * resource utilization (e.g., usable capacity falling below configured watermark) 21 * certain type of workload (e.g., PUT into a mirrored or erasure-coded bucket) 22 * user request (e.g., to reduce the number of local object copies in a given bucket) 23 * adding or removing storage targets (the events that trigger cluster-wide rebalancing) 24 * adding or removing local disks (the events that cause resilver to start moving stored content between *mountpaths* - see [Managing mountpaths](/docs/configuration.md#managing-mountpaths)) 25 * and more... 26 27 Further, to reduce congestion and minimize interference with user-generated workload, extended actions (self-)throttle themselves based on configurable watermarks. The latter include `disk_util_low_wm` and `disk_util_high_wm` (see [configuration](/deploy/dev/local/aisnode_config.sh)). Roughly speaking, the idea is that when local disk utilization falls below the low watermark (`disk_util_low_wm`) extended actions that utilize local storage can run at full throttle. And vice versa. 28 29 The amount of throttling that a given xaction imposes on itself is always defined by a combination of dynamic factors. 30 To give concrete examples, an extended action that runs LRU evictions performs its "balancing act" by taking into account the remaining storage capacity **and** the current utilization of the local filesystems. 31 The mirroring (xaction) takes into account congestion on its communication channel that callers use for posting requests to create local replicas. 32 33 --------------------------------------------------------------- 34 35 **NOTE (Dec 2021):** rest of this document is somewhat **outdated** and must be revisited. For the most recently updated information on running and monitoring *xactions*, please see: 36 37 * [Batch operations](/docs/batch.md) 38 * [CLI documentation](/docs/cli.md), and in particular: 39 - [`ais show job`](/docs/cli/job.md) 40 - [`ais show job dsort`](/docs/cli/dsort.md) 41 - [`ais show job download`](/docs/cli/download.md) 42 - [`ais show rebalance`](/docs/rebalance.md) 43 * And also: 44 - [`ais etl`](/docs/cli/etl.md) 45 - [multi-object operations](/docs/cli/object.md#operations-on-lists-and-ranges) 46 - [reading, writing, and listing archives](/docs/cli/object.md) 47 - [copying buckets](/docs/cli/bucket.md#copy-bucket) 48 49 --------------------------------------------------------------- 50 51 52 Supported extended actions are enumerated in the [user-facing API](/cmn/api.go) and include: 53 54 * cluster-wide rebalancing (denoted as `ActGlobalReb` in the [API](/cmn/api.go)) that gets triggered when storage targets join or leave the cluster 55 * LRU-based cache eviction (see [LRU](/docs/storage_svcs.md#lru)) that depends on the remaining free capacity and [configuration](/deploy/dev/local/aisnode_config.sh) 56 * prefetching batches of objects (or arbitrary size) from the Cloud (see [List/Range Operations](/docs/batch.md)) 57 * consensus voting (when conducting new leader [election](/docs/ha.md#election)) 58 * erasure-encoding objects in a EC-configured bucket (see [Erasure coding](/docs/storage_svcs.md#erasure-coding)) 59 * creating additional local replicas, and reducing number of object replicas in a given locally-mirrored bucket (see [Storage Services](/docs/storage_svcs.md)) 60 * and more... 61 62 There are different actions that may be taken upon xaction. 63 Actions include stats, start and stop. 64 List of supported actions can be found in the [API](/cmn/api.go) 65 66 Xaction requests are generic for all xactions, but responses from each xaction are different. 67 See [below](#start-and-stop). 68 The request looks as follows: 69 70 1. Single target request: 71 72 ```console 73 $ curl -i -X GET -H 'Content-Type: application/json' -d '{"action": "actiontype", "name": "xactionname", "value":{"bucket":"bucketname"}}' 'http://T/v1/daemon?what=xaction' 74 ``` 75 76 To simplify the logic, result is always an array, even if there's only one element in the result 77 78 2. Proxy request, which executes a request on all targets within the cluster, and responds with list of targets' responses: 79 80 ```console 81 $ curl -i -X GET -H 'Content-Type: application/json' -d '{"action": "actiontype", "name": "xactionname", "value":{"bucket":"bucketname"}}' 'http://G/v1/cluster?what=xaction' 82 ``` 83 84 Response of a query to proxy is a map of daemonID -> target's response. If any of targets responded with error status code, the proxy's response 85 will result in the same error response. 86 87 88 ### Start and Stop 89 90 For a successful request, the response only contains the HTTP status code. If the request was sent to the proxy and all targets 91 responded with a successful HTTP code, the proxy would respond with the successful HTTP code. The response body should be omitted. 92 93 For an unsuccessful request, the target's response contains the error code and error message. If the request was sent to proxy and at least one 94 of targets responded with an error code, the proxy will respond with the same error code and error message. 95 96 > As always, `G` above (and throughout this entire README) serves as a placeholder for the _real_ gateway's hostname/IP address and `T` serves for placeholder for target's hostname/IP address. More information in [notation section](/docs/http_api.md#notation). 97 98 The corresponding [RESTful API](/docs/http_api.md) includes support for querying all xactions including global-rebalancing and prefetch operations. 99 100 ### Stats 101 102 Stats request results in list of requested xactions. Statistics of each xaction share a common base format which looks as follow: 103 104 ```json 105 [ 106 { 107 "id":1, 108 "kind":"ec-get", 109 "bucket":"test", 110 "startTime":"2019-04-15T12:40:18.721697505-07:00", 111 "endTime":"0001-01-01T00:00:00Z", 112 "status":"InProgress" 113 }, 114 { 115 "id":2, 116 "kind":"ec-put", 117 "bucket":"test", 118 "startTime":"2019-04-15T12:40:18.721723865-07:00", 119 "endTime":"0001-01-01T00:00:00Z", 120 "status":"InProgress" 121 } 122 ] 123 ``` 124 125 Any xaction can have additional fields, which are included in additional field called `"ext"` 126 127 Example rebalance stats response: 128 129 ```json 130 [ 131 { 132 "id": 3, 133 "kind": "rebalance", 134 "bucket": "", 135 "start_time": "2019-04-15T13:38:51.556388821-07:00", 136 "end_time": "0001-01-01T00:00:00Z", 137 "status": "InProgress", 138 "count": 0, 139 "ext": { 140 "tx.n": 0, 141 "tx.size": 0, 142 "rx.n": 0, 143 "rx.size": 0 144 } 145 } 146 ] 147 ``` 148 149 If flag `--all` is provided, stats command will display old, finished xactions, along with currently running ones. If `--all` is not set (default), only 150 the most recent xactions will be displayed, for each bucket, kind or (bucket, kind) 151 152 ## References 153 154 For xaction-related CLI documentation and examples, supported multi-object (batch) operations, and more, please see: 155 156 * [Batch operations](/docs/batch.md)