github.com/kaisenlinux/docker.io@v0.0.0-20230510090727-ea55db55fac7/swarmkit/design/task_model.md (about)

     1  # SwarmKit task model
     2  
     3  This document explains some important properties of tasks in SwarmKit. It
     4  covers the types of state that exist for a task, a task's lifecycle, and the
     5  slot model that associates a task with a particular replica or node.
     6  
     7  ## Task message
     8  
     9  Tasks are defined by the `Task` protobuf message. A simplified version of this
    10  message, showing only the fields described in this document, is presented below:
    11  
    12  ```
    13  // Task specifies the parameters for implementing a Spec. A task is effectively
    14  // immutable and idempotent. Once it is dispatched to a node, it will not be
    15  // dispatched to another node.
    16  message Task {
    17          string id = 1 [(gogoproto.customname) = "ID"];
    18  
    19          // Spec defines the desired state of the task as specified by the user.
    20          // The system will honor this and will *never* modify it.
    21          TaskSpec spec = 3 [(gogoproto.nullable) = false];
    22  
    23          // ServiceID indicates the service under which this task is
    24          // orchestrated. This should almost always be set.
    25          string service_id = 4 [(gogoproto.customname) = "ServiceID"];
    26  
    27          // Slot is the service slot number for a task.
    28          // For example, if a replicated service has replicas = 2, there will be
    29          // a task with slot = 1, and another with slot = 2.
    30          uint64 slot = 5;
    31  
    32          // NodeID indicates the node to which the task is assigned. If this
    33          // field is empty or not set, the task is unassigned.
    34          string node_id = 6 [(gogoproto.customname) = "NodeID"];
    35  
    36          TaskStatus status = 9 [(gogoproto.nullable) = false];
    37  
    38          // DesiredState is the target state for the task. It is set to
    39          // TaskStateRunning when a task is first created, and changed to
    40          // TaskStateShutdown if the manager wants to terminate the task. This
    41          // field is only written by the manager.
    42          TaskState desired_state = 10;
    43  }
    44  ```
    45  
    46  ### ID
    47  
    48  The `id` field contains a unique ID string for the task.
    49  
    50  ### Spec
    51  
    52  The `spec` field contains the specification for the task. This is a part of the
    53  service spec, which is copied to the task object when the task is created. The
    54  spec is entirely specified by the user through the service spec. It will never
    55  be modified by the system.
    56  
    57  ### Service ID
    58  
    59  `service_id` links a task to the associated service. Tasks link back to the
    60  service that created them, rather than services maintaining a list of all
    61  associated tasks. Generally, a service's tasks are listed by querying for tasks
    62  where service_id has a specific value. In some cases, there are tasks that exist
    63  independent of any service. These do not have a value set in `service_id`.
    64  
    65  ### Slot
    66  
    67  `slot` is used for replicated tasks to identify which slot the task satisfies.
    68  The slot model is discussed in more detail below.
    69  
    70  ### Node ID
    71  
    72  `node_id` assigns the task to a specific node. This is used by both replicated
    73  tasks and global tasks. For global tasks, the node ID is assigned when the task
    74  is first created. For replicated tasks, it is assigned by the scheduler when
    75  the task gets scheduled.
    76  
    77  ### Status
    78  
    79  `status` contains *observed* state of the task as reported by the agent. The
    80  most important field inside `status` is `state`, which indicates where the task
    81  is in its lifecycle (assigned, running, complete, and so on). The status
    82  information in this field may become out of date if the node that the task is
    83  assigned to is unresponsive. In this case, it's up to the orchestrator to
    84  replace the task with a new one.
    85  
    86  ### Desired state
    87  
    88  Desired state is the state that the orchestrator would like the task to progress
    89  to. This field provides a way for the orchestrator to control when the task can
    90  advance in state. For example, the orchestrator may create a task with desired
    91  state set to `READY` during a rolling update, and then advance the desired state
    92  to `RUNNING` once the old task it is replacing has stopped. This gives it a way
    93  to get the new task ready to start (for example, pulling the new image), without
    94  actually starting it.
    95  
    96  ## Properties of tasks
    97  
    98  A task is a "one-shot" execution unit. Once a task stops running, it is never
    99  executed again. A new task may be created to replace it.
   100  
   101  Tasks states are changed in a monotonic progression. Tasks may move to states
   102  beyond the current state, but their states may never move backwards.
   103  
   104  ## Task history
   105  
   106  Once a task stops running, the task object is not necessarily removed from the
   107  distributed data store. Generally, a few historic tasks for each slot of each
   108  service are retained to provide task history. The task reaper will garbage
   109  collect old tasks if the limit of historic tasks for a given slot is reached.
   110  Currently, retention of containers on the workers is tied to the presence of the
   111  old task objects in the distributed data store, but this may change in the
   112  future.
   113  
   114  ## Task lifecycle
   115  
   116  Tasks are created by the orchestrator. They may be created for a new service, or
   117  to scale up an existing service, or to replace tasks for an existing service
   118  that are no longer running for whatever reason. The orchestrator creates tasks
   119  in the `NEW` state.
   120  
   121  Tasks next run through the allocator, which allocate resources such as network
   122  attachments which are necessary for the tasks to run. When the allocator has
   123  processed a task, it moves the task to the `PENDING` state.
   124  
   125  The scheduler takes `PENDING` tasks and assigns them to nodes (or verifies
   126  that the requested node has the necessary resources, in the case of global
   127  services' tasks). It changes their state to `ASSIGNED`.
   128  
   129  From this point, control over the state passes to the agent. A task will
   130  progress through the `ACCEPTED`, `PREPARING`, `READY', and `STARTING` states on
   131  the way to `RUNNING`. If a task exits without an error code, it moves to the
   132  `COMPLETE` state. If it fails, it moves to the `FAILED` state instead.
   133  
   134  A task may alternatively end up in the `SHUTDOWN` state if its shutdown was
   135  requested by the orchestrator (by setting desired state to `SHUTDOWN`),
   136  the `REJECTED` state if the agent rejected the
   137  task, or the `ORPHANED` state if the node on which the task is scheduled is
   138  down for too long. The orchestrator will also set desired state for a task not
   139  already in a terminal state to
   140  `REMOVE` when the service associated with the task was removed or scaled down
   141  by the user. When this happens, the agent proceeds to shut the task down.
   142  The task is removed from the store by the task reaper only after the shutdown succeeds.
   143  This ensures that resources associated with the task are not released before
   144  the task has shut down.
   145  Tasks that were removed because of service removal or scale down
   146  are not kept around in task history.
   147  
   148  The task state can never move backwards - it only increases monotonically.
   149  
   150  ## Slot model
   151  
   152  Replicated tasks have a slot number assigned to them. This allows the system to
   153  track the history of a particular replica over time.
   154  
   155  For example, a replicated service with three replicas would lead to three tasks,
   156  with slot numbers 1, 2, and 3. If the task in slot 2 fails, a new task would be
   157  started with `Slot = 2`. Through the slot numbers, the administrator would be
   158  able to see that the new task was a replacement for the previous one in slot 2
   159  that failed.
   160  
   161  The orchestrator for replicated services tries to make sure the correct number
   162  of slots have a running task in them. For example, if this 3-replica service
   163  only has running tasks with two distinct slot numbers, it will create a third
   164  task with a different slot number. Also, if there are 4 slot numbers represented
   165  among the tasks in the running state, it will kill one or more tasks so that
   166  there are only 3 slot numbers between the running tasks.
   167  
   168  Slot numbers may be noncontiguous. For example, when a service is scaled down,
   169  the task that's removed may not be the one with the highest slot number.
   170  
   171  It's normal for a slot to have multiple tasks. Generally, there will be a single
   172  task with the desired state of `RUNNING`, and also some historic tasks with a
   173  desired state of `SHUTDOWN` that are no longer active in the system. However,
   174  there are also cases where a slot may have multiple tasks with the desired state
   175  of `RUNNING`. This can happen during rolling updates when the updates are
   176  configured to start the new task before stopping the old one. The orchestrator
   177  isn't confused by this situation, because it only cares about which slots are
   178  satisfied by at least one running task, not the detailed makeup of those slots.
   179  The updater takes care of making sure that each slot converges to having a
   180  single running task.
   181  
   182  Also, for application availability, multiple tasks can share the single slot
   183  number when a network partition occurs between nodes. If a node is split from
   184  manager nodes, the tasks that were running on the node will be recreated on
   185  another node.  However, the tasks on the split node can still continue
   186  running. So the old tasks and the new ones can share identical slot
   187  numbers. These tasks may be considered "orphaned" by the manager, after some
   188  time. Upon recovering the split, these tasks will be killed.
   189  
   190  Global tasks do not have slot numbers, but the concept is similar. Each node in
   191  the system should have a single running task associated with it. If this is not
   192  the case, the orchestrator and updater work together to create or destroy tasks
   193  as necessary.