github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/xact/README.md (about)

     1  This is the top eXtended Action (`xaction`) directory containing much of the common functionality and interfaces used by the rest of the code.
     2  
     3  In addition, it contains subdirectories:
     4  
     5  * `xreg` - xaction registry
     6  * `xs` - concrete named xactions, e.g. `apc.ActRebalance`, `apc.ActPromote`, `apc.ActSummaryBck` and other enumerated *kinds*.
     7  
     8  > For all supported xactions, their *kinds* and static properties, see `xact.Table`.
     9  
    10  > Xaction *kinds* are generally consistent with the API constants from `api/apc/const.go`.
    11  
    12  ## Extended Actions (xactions)
    13  
    14  Batch operations that may take many seconds (minutes, hours) to execute are called eXtended actions or *xactions*.
    15  
    16  Xactions run asynchronously, have one of the enumerated kinds, start/stop times, and xaction-specific statistics.
    17  Xactions start running based on a wide variety of runtime conditions that include:
    18  
    19  * periodic (defined by a configured interval of time)
    20  * resource utilization (e.g., usable capacity falling below configured watermark)
    21  * certain type of workload (e.g., PUT into a mirrored or erasure-coded bucket)
    22  * user request (e.g., to reduce the number of local object copies in a given bucket)
    23  * adding or removing storage targets (the events that trigger cluster-wide rebalancing)
    24  * adding or removing local disks (the events that cause resilver to start moving stored content between *mountpaths* - see [Managing mountpaths](/docs/configuration.md#managing-mountpaths))
    25  * and more...
    26  
    27  Further, to reduce congestion and minimize interference with user-generated workload, extended actions (self-)throttle themselves based on configurable watermarks. The latter include `disk_util_low_wm` and `disk_util_high_wm` (see [configuration](/deploy/dev/local/aisnode_config.sh)). Roughly speaking, the idea is that when local disk utilization falls below the low watermark (`disk_util_low_wm`) extended actions that utilize local storage can run at full throttle. And vice versa.
    28  
    29  The amount of throttling that a given xaction imposes on itself is always defined by a combination of dynamic factors.
    30  To give concrete examples, an extended action that runs LRU evictions performs its "balancing act" by taking into account the remaining storage capacity **and** the current utilization of the local filesystems.
    31  The mirroring (xaction) takes into account congestion on its communication channel that callers use for posting requests to create local replicas.
    32  
    33  ---------------------------------------------------------------
    34  
    35  **NOTE (Dec 2021):** rest of this document is somewhat **outdated** and must be revisited. For the most recently updated information on running and monitoring *xactions*, please see:
    36  
    37  * [Batch operations](/docs/batch.md)
    38  * [CLI documentation](/docs/cli.md), and in particular:
    39    - [`ais show job`](/docs/cli/job.md)
    40    - [`ais show job dsort`](/docs/cli/dsort.md)
    41    - [`ais show job download`](/docs/cli/download.md)
    42    - [`ais show rebalance`](/docs/rebalance.md)
    43  * And also:
    44    - [`ais etl`](/docs/cli/etl.md)
    45    - [multi-object operations](/docs/cli/object.md#operations-on-lists-and-ranges)
    46    - [reading, writing, and listing archives](/docs/cli/object.md)
    47    - [copying buckets](/docs/cli/bucket.md#copy-bucket)
    48  
    49  ---------------------------------------------------------------
    50  
    51  
    52  Supported extended actions are enumerated in the [user-facing API](/cmn/api.go) and include:
    53  
    54  * cluster-wide rebalancing (denoted as `ActGlobalReb` in the [API](/cmn/api.go)) that gets triggered when storage targets join or leave the cluster
    55  * LRU-based cache eviction (see [LRU](/docs/storage_svcs.md#lru)) that depends on the remaining free capacity and [configuration](/deploy/dev/local/aisnode_config.sh)
    56  * prefetching batches of objects (or arbitrary size) from the Cloud (see [List/Range Operations](/docs/batch.md))
    57  * consensus voting (when conducting new leader [election](/docs/ha.md#election))
    58  * erasure-encoding objects in a EC-configured bucket (see [Erasure coding](/docs/storage_svcs.md#erasure-coding))
    59  * creating additional local replicas, and reducing number of object replicas in a given locally-mirrored bucket (see [Storage Services](/docs/storage_svcs.md))
    60  * and more...
    61  
    62  There are different actions that may be taken upon xaction.
    63  Actions include stats, start and stop.
    64  List of supported actions can be found in the [API](/cmn/api.go)
    65  
    66  Xaction requests are generic for all xactions, but responses from each xaction are different.
    67  See [below](#start-and-stop).
    68  The request looks as follows:
    69  
    70  1. Single target request:
    71  
    72      ```console
    73      $ curl -i -X GET  -H 'Content-Type: application/json' -d '{"action": "actiontype", "name": "xactionname", "value":{"bucket":"bucketname"}}' 'http://T/v1/daemon?what=xaction'
    74      ```
    75  
    76      To simplify the logic, result is always an array, even if there's only one element in the result
    77  
    78  2. Proxy request, which executes a request on all targets within the cluster, and responds with list of targets' responses:
    79  
    80      ```console
    81      $ curl -i -X GET  -H 'Content-Type: application/json' -d '{"action": "actiontype", "name": "xactionname", "value":{"bucket":"bucketname"}}' 'http://G/v1/cluster?what=xaction'
    82      ```
    83  
    84      Response of a query to proxy is a map of daemonID -> target's response. If any of targets responded with error status code, the proxy's response
    85      will result in the same error response.
    86  
    87  
    88  ### Start and Stop
    89  
    90  For a successful request, the response only contains the HTTP status code. If the request was sent to the proxy and all targets
    91  responded with a successful HTTP code, the proxy would respond with the successful HTTP code. The response body should be omitted.
    92  
    93  For an unsuccessful request, the target's response contains the error code and error message. If the request was sent to proxy and at least one
    94  of targets responded with an error code, the proxy will respond with the same error code and error message.
    95  
    96  > As always, `G` above (and throughout this entire README) serves as a placeholder for the _real_ gateway's hostname/IP address and `T` serves for placeholder for target's hostname/IP address. More information in [notation section](/docs/http_api.md#notation).
    97  
    98  The corresponding [RESTful API](/docs/http_api.md) includes support for querying all xactions including global-rebalancing and prefetch operations.
    99  
   100  ### Stats
   101  
   102  Stats request results in list of requested xactions. Statistics of each xaction share a common base format which looks as follow:
   103  
   104  ```json
   105  [
   106     {
   107        "id":1,
   108        "kind":"ec-get",
   109        "bucket":"test",
   110        "startTime":"2019-04-15T12:40:18.721697505-07:00",
   111        "endTime":"0001-01-01T00:00:00Z",
   112        "status":"InProgress"
   113     },
   114     {
   115        "id":2,
   116        "kind":"ec-put",
   117        "bucket":"test",
   118        "startTime":"2019-04-15T12:40:18.721723865-07:00",
   119        "endTime":"0001-01-01T00:00:00Z",
   120        "status":"InProgress"
   121     }
   122  ]
   123  ```
   124  
   125  Any xaction can have additional fields, which are included in additional field called `"ext"`
   126  
   127  Example rebalance stats response:
   128  
   129  ```json
   130  [
   131      {
   132        "id": 3,
   133        "kind": "rebalance",
   134        "bucket": "",
   135        "start_time": "2019-04-15T13:38:51.556388821-07:00",
   136        "end_time": "0001-01-01T00:00:00Z",
   137        "status": "InProgress",
   138        "count": 0,
   139        "ext": {
   140          "tx.n": 0,
   141          "tx.size": 0,
   142          "rx.n": 0,
   143          "rx.size": 0
   144        }
   145      }
   146  ]
   147  ```
   148  
   149  If flag `--all` is provided, stats command will display old, finished xactions, along with currently running ones. If `--all` is not set (default), only
   150  the most recent xactions will be displayed, for each bucket, kind or (bucket, kind)
   151  
   152  ## References
   153  
   154  For xaction-related CLI documentation and examples, supported multi-object (batch) operations, and more, please see:
   155  
   156  * [Batch operations](/docs/batch.md)