github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/task-management-api.md (about) 1 # Task management API: requirements 2 3 Multiple subsystems require reliable long-lived distributed operations. We shall support them 4 with a task queue subsystem. Concepts of the API are intended to parallel concepts in the 5 [Celery][celery-api] and/or [Machinery][machinery-api] APIs. We do not use either of them to 6 reduce the number of required dependencies. A future version may support external queues to 7 achieve better performance or robustness, at the price of increased ops or cost. These will 8 likely be queues such as Kafka or SQS rather than full Celery or Machinery. 9 10 In practice this generally means providing APIs similar to those of Machinery (which is more 11 Go-like than Celery) for constructing task flows and for registering workers. 12 13 In particular: 14 1. We provide similar concepts for building task flows as do existing 15 task queues. 16 1. We use similar terminology. 17 1. We do *not* require the entire API of an existing task queue. 18 1. We do *not* use the verbs or API calls of an existing task queue. 19 20 This API definition comes with implementation sketches for how we to use these APIs to implement 21 the branch export story. We shall also (re-)implement retention expiry to use these APIs for 22 better ops; that story is considerably easier to imagine. 23 24 ## API 25 26 ### Concepts 27 28 #### Tasks 29 30 A task is the basic atom of task management. It represents a single unit of work to 31 perform, and can succeed or fail. Tasks may be retried on failure, so _executing a task 32 must be idempotent_. 33 34 Tasks connect to application code via an action and a body. The _action_ identifies the 35 operation to complete these task. Examples of actions can include "copy a file", "delete a 36 path", "report success". It is essentially the _name_ of a procedure to perform at some future 37 time. The _body_ of a task gives information necessary to configure the specific task. 38 Examples of bodies can include "source file path X, destination path Z" (for a copy task), "path 39 Z" (for a delete task), or "date started, number of objects and a message" (for a report task). 40 It essentially holds the _parameters_ the action uses to perform the task. 41 42 Tasks include these attributes: 43 - `Id`: a unique identifier for the task. Use a known-unique substring in the identifier 44 (e.g. a UUID or [nanoid][nanoid]) to avoid collisions, or a well-known identifier to ensure 45 only one task of a type can exist. 46 - `Action`: the type of action to perform for this task. Workers pick tasks to perform and the 47 actions to perform on them according to this field. 48 - `Body`: a description of parameters for this text. E.g. in a "copy file" task the body might 49 specify source key, ETag and destination key. 50 - `StatusCode`: the internally-used state of the task in its lifecycle, see [life of a 51 task](#life-of-a-task) below. 52 - `Status`: a textual description of the current status, generated by application code. 53 - `NumSignals`: number of tasks that must signal this task before it can be performed. 54 Initially equal to the number of tasks on which it appears in the `ToSignalAfter` array. 55 - `MaxTries`: the maximal number of times to try to execute the task if it keeps being returned 56 to state `pending`. 57 - `ActorId`: the unique string identifier chosen by a worker which is currently performing the 58 task. Useful for monitoring. 59 - `ActionDeadline`: a time by which the worker currently performing the task has committed to 60 finish it. 61 - `ToSignalAfter`: an array of task IDs that cannot start before this task ends, and will therefore 62 be signalled when it does. 63 64 Tasks provide these additional facilities (and include fields not listed here to support them): 65 - **Retries**. A task repeatedly placed back into state `pending` will not be retried again. 66 - **Dependencies**. Every task can only occur after some other tasks 67 are done. 68 69 A task is performed by a single worker; if that worker does not finish processing it and an 70 action deadline was set, it will be given to another worker. 71 72 #### Life of a task 73 74 ``` 75 | 76 | InsertTasks 77 | 78 | 79 +-----v-----+ 80 +-->| pending | 81 | +-----+-----+ 82 ReturnTask| | 83 (to | | OwnTasks 84 pending) | | 85 | +-----v-----+ 86 +---+in-progress| 87 +-----------+ 88 | 89 +------------+------------+ ReturnTask 90 | | 91 +----v---+ +----v----+ 92 |aborted | |completed| 93 +--------+ +---------+ 94 ``` 95 96 A task arrives complete with dependencies, a count of the number of preceding tasks that 97 must "signal" it before it may be executed. When the task completes it signals all of its 98 dependent tasks. 99 100 Tasks are inserted in state `pending`. Multiple workers call `OwnTasks` to get tasks. A 101 task may only be claimed by a call to `OwnTasks` if: 102 * Its action is specified as acceptable to that call. 103 * All dependencies of the task have been settled: all tasks specifying its task ID in their 104 `ToSignalAfter` list have completed. 105 * The task is not claimed by another worker. Either: 106 - the task is in state `pending`, or 107 - the task is in state `in-progress`, but its `ActionDeadline` has elapsed (see "ownership 108 expiry", below). 109 110 `OwnTasks` returns task IDs and for each returned task a "performance token" for this 111 performance of it. Both ID and token must be provided to _return_ the task from ownership. 112 (The performance token is used to resolve conflicts during "ownership expiry", below.) 113 114 A typical use is that a worker loop repeatedly calls `OwnTasks` on one or more actions, and 115 dispatches each to a separate function. The application controls concurrency by setting the 116 number of concurrent worker loops. For instance, it might set 20 worker loops to perform "copy" 117 and "delete" tasks and a single worker loop to perform "report to DataDog". 118 119 Once a worker owns a task, it performs it. It can decide to return the task to the task 120 queue and _complete_, _abort_ or _retry_ it by calling `ReturnTask`. Once completed, all 121 dependents of the task are signalled, causing any dependent that has received all its 122 required signals to be eligible for return by `OwnTasks`. 123 124 #### Ownership expiry 125 126 Processes can fail. To allow restarting a failed process calls to `OwnTasks` may specify a 127 deadline. The lease granted to an owning worker will expire after this deadline, allowing 128 another worker to own the task. Only the _last_ worker granted ownership may call 129 `ReturnTask` on the task. A delayed worker should still return the task, in case the task 130 has not yet been granted to another worker. 131 132 #### Basic API 133 134 This is a sample API. All details are fully subject to change, of course! Note that most 135 `func`s are probably going to be methods on some object, which we assume will carry DB 136 connection information etc. 137 138 ##### TaskData 139 140 ```go 141 type TaskId string 142 143 type ActorId string 144 145 type PerformanceToken pgtype.UUID // With added stringifiers 146 147 // TaskData describes a task to perform. 148 type TaskData struct { 149 Id TaskId // Unique ID of task 150 Action string // Action to perform, used to fetch in OwnTasks 151 Body *string // Body containing details of action, used by clients only 152 Status *string // Human- and client-readable status 153 StatusCode TaskStatusCodeValue // Status code, used by task queue 154 NumTries int // Number of times this task has moved from started to in-progress 155 MaxTries *int // Maximal number of times to try this task 156 // Dependencies might be stored or handled differently, depending on what gives reasonable 157 // performance. 158 TotalDependencies *int // Number of tasks which must signal before this task can be owned 159 ToSignalAfter []TaskId // Tasks to signal after this task is done 160 ActorId ActorId // ID of current actor performing this task (if in-progress) 161 ActionDeadline *time.Time // Deadline for current actor to finish performing this task (if in-progress) 162 PerformanceToken *PerformanceToken // Token to allow ReturnTask 163 PostResult bool // If set allow waiting for this task using WaitForTask 164 } 165 ``` 166 167 ##### InsertTasks 168 169 ```go 170 // InsertTasks atomically adds all tasks to the queue: if any task cannot be added (typically because 171 // it re-uses an existing key) then no tasks will be added. If PostResult was set on any tasks then 172 // they can be waited upon after InsertTasks returns. 173 func InsertTasks(ctx context.Context, source *taskDataIterator) error 174 ``` 175 176 A variant allows inserting a task _by force_ 177 178 ```go 179 // ReplaceTasks atomically adds all tasks to the queue. If a task not yet in-process with the same 180 // ID already exists then _replace it_ as though it were atomically aborted before this insert. If 181 // PostResult was set on any tasks then they can be waited upon after InsertTasks returns. Tasks that 182 // are in process cannot be replaced. 183 func ReplaceTasks(ctx context.Context, source *taskDataIterator) error 184 ``` 185 186 ##### OwnTasks 187 188 ```go 189 // OwnedTaskData is a task returned from OwnedTask 190 type OwnedTaskData struct { 191 Id TaskId `db:"task_id"` 192 Token PerformanceToken `db:"token"` 193 Action string 194 Body *string 195 } 196 197 // OwnTasks owns for actor and returns up to maxTasks tasks for performing any of actions, setting 198 // the lifetime of each returned owned task to maxDuration. 199 func OwnTasks(ctx context.Context, actor ActorId, maxTasks int, actions []string, maxDuration *time.Duration) ([]OwnedTaskData, error) 200 ``` 201 202 `maxDuration` should be a time during which no other worker can access the task. It does not 203 have to be the time to _complete_ the task: workers can periodically call `ExtendTasksOwnership` 204 to extend the lifetime. 205 206 ##### ExtendTasksOwnership 207 208 ```go 209 // ExtendTasksOwnership extends the current action lifetime for each of task by another maxDuration, 210 // if that task is still owned by this actor with that performance token. It returns true for each 211 // task if it is still owned, or false if ownership extension failed because the task is no longer 212 // owned. 213 func ExtendTasksOwnership(ctx context.Context, actor ActorId, toExtend []OwnedTaskData, maxDuration time.Duration) ([]bool, error) 214 ``` 215 216 ##### ReturnTask 217 218 ```go 219 // ReturnTask returns taskId which was acquired using the specified performanceToken, giving it 220 // resultStatus and resultStatusCode. It returns InvalidTokenError if the performanceToken is 221 // invalid; this happens when ReturnTask is called after its deadline expires, or due to a logic 222 // error. If resultStatusCode is ABORT, abort all succeeding tasks. 223 func ReturnTask(ctx context.Context, taskId TaskId, token PerformanceToken, resultStatus string, resultStatusCode TaskStatusCodeValue) error 224 ``` 225 226 ##### WaitForTask 227 228 ```go 229 // WaitForTask waits for taskId (which must have been started with PostResult) to finish and 230 // returns it. It returns immediately the task has already finished. 231 func WaitForTask(ctx context.Context, taskId TaskId) (TaskData, error) 232 ``` 233 234 ##### AddDependencies 235 236 ```go 237 // AddDependencies atomically adds dependencies: for every dependency, task Run must run after 238 // task After. 239 type TaskDependency interface { 240 After, Run TaskID 241 } 242 243 func AddDependencies(ctx context.Context, dependencies []TaskDependency) error 244 ``` 245 246 ##### Monitoring 247 248 Also some routine as a basis for monitoring: it gives the number and status of each of a number 249 of actions and task IDs, possibly with some filtering. The exact nature depends on the 250 implementation chosen, however we _do_ require its availability. 251 252 #### Differences from the Celery model 253 254 This task management model is at a somewhat lower level than the Celery model: 255 * **Workers explicitly loop to own and handle tasks.** Emulate the Celery model by writing an 256 explicit function that takes "handlers" for the different actions. We may well do this. 257 258 _Why change?_ Writing the loop is rarely an important factor. Flexibility in specifying the 259 action parameter of OwnTasks allows variable actions, for instance handling particular action 260 types only when a particular environmental condition is met (say, system load), or 261 incorporating side data in action names (and not only in task IDs). Flexibility in timing 262 allows a per-process rate limiters for particular actions: filter out expensive actions when 263 their token bucket runs out. Flexibility in specifying _when_ OwnTasks is called allows 264 controlling load on the queuing component. Flexibility in specifying action dispatch allows 265 controlling _how many goroutines_ run particular actions concurrently. All this without 266 having to add configurable structures to the task manager. 267 * **No explicit graph structures.** Emulate these using the section [Structures][#structures] 268 below. 269 * **No implicit argument serialization.** Rather than flatten an "args" array we pass a stringy 270 "body". In practice "args" anyway require serialization; incorporating them into the queue 271 requires either configuring the queue with relevant serialization or allowing only primitives. 272 Celery selects the first, Machinery the second. In both cases a Go client library must place 273 most of the serialization burden on application code -- simplest is to do so explicitly. 274 275 #### Structures 276 277 We can implement what Celery calls _Chains_, _Chords_ and _Groups_ using the basic API: these 278 are just ways to describe structured dependencies which form [parallel/serial 279 networks][parallel-series] networks. Drawings appear below. 280 281 ##### Chains 282 283 ``` 284 +----------+ 285 | task 1 | 286 +----------+ 287 | 288 | 289 +----v-----+ 290 | task 2 | 291 +----+-----+ 292 | 293 | 294 +----v-----+ 295 | task 3 | 296 +----------+ 297 ``` 298 299 ##### Chords (and Groups) 300 301 ``` 302 +--------+ 303 +--->| task1 +-----+ 304 | +--------+ | 305 | | 306 | | 307 | +--------+ | 308 +--->| task2 +-----+ 309 | +--------+ | 310 +-------+ | | +-------------+ 311 | prev |-----+ +---->|(spontaneous)| 312 +-------+ | +--------+ | +-------------+ 313 +--->| task3 +-----+ 314 | +--------+ | 315 | | 316 | | 317 | +--------+ | 318 +--->| task4 +-----+ 319 +--------+ 320 ``` 321 322 ## Implementing "user" stories with the API 323 324 ### Branch export 325 326 Each branch uses separate tasks arranged in a cycle. These names are task IDs with a matching 327 action name: e.g. the action name for `next-export-{branch}` is `next-export`. (The branch name 328 also appears in the body.) 329 330 * `next-export-{branch}` is to start the next export if an export is already underway, by 331 creating a task `start-export-{branch}`. 332 * `start-export-{branch}` handles the actual logic of generating the copy tasks in a network, 333 leading eventually to the task `done-export-{branch}` (which is also generates) becoming 334 available. 335 * `done-export-{branch}` is there so that `next-export-{branch}` can depend on it -- and not 336 start before the current export operation terminates. (If it does not exist, 337 `next-export-{branch}` has no dependency blocking it and can run immediately.) 338 339 The actual steps: 340 1. Under the merge/commit lock for the branch: _replace_ the task with ID `next-export-{branch}` 341 with a task to export _this_ commit ID. And add a dependency on `done-export-{branch}` 342 (which may fail if that task has completed; that is safe). 343 1. To handle `next-export-{branch}`: create `start-export-{branch}` (the previous one must have 344 ended), and return the task. 345 1. To handle `start-export-{branch}`: 346 1. Generate a task to copy or delete each file object (this is an opportunity to batch 347 multiple file objects if performance doesn't match. `done-export-{branch}` depends on 348 each of these tasks (and cannot have been deleted since `start-export-{branch}` has not 349 yet been returned). For every prefix for which such an object is configured, add a task 350 to generate its `.../_SUCCESS` object on S3, dependent on all the objects under that 351 prefix (or, to handle objects in sub-prefixes, on just the `_SUCCESS` of that sub-prefix). 352 1. Add a task to generate manifests, dependent on `done-export-{branch}`. 353 1. Return `start-export-{branch}` as completed. 354 1. To handle a copy or delete operation, perform it. 355 1. To handle `done-export-{branch}`: just return it, it can be spontaneous (if the task queue 356 supports that). 357 358 `next-export-{branch}` is used to serialize branch exports, achieving the requirement for single 359 concurrent export per branch. Per-prefix `_SUCCESS` objects are generated on time due to their 360 dependencies. (As an option, we could set priorities and return tasks in priority order from 361 `OwnTasks`, to allow `_SUCCESS` objects to be created before copying other objects.) Retries 362 are handled by setting multiple per-copy attempts. 363 364 ## References 365 366 ### Well-known task queues 367 1. [Celery][celery-api] 368 2. [Machinery][machinery-api] 369 370 ### Modules 371 1. [nanoid][nanoid] 372 373 ### Graphs 374 1. [Parallel series][parallel-series] 375 376 [celery-api]: https://docs.celeryproject.org/en/stable/userguide/index.html 377 [machinery-api]: https://github.com/RichardKnop/machinery#readme 378 [nanoid]: https://www.npmjs.com/package/nanoid 379 [parallel-series]: https://www.cpp.edu/~elab/projects/project_05/index.html