github.com/kaisenlinux/docker.io@v0.0.0-20230510090727-ea55db55fac7/swarmkit/design/topology.md (about)

     1  # Topology aware scheduling
     2  
     3  ## Background
     4  
     5  There is often interest in making the scheduler aware of factors such as
     6  availability zones. This document specifies a generic way to customize scheduler
     7  behavior based on labels attached to nodes.
     8  
     9  ## Approach
    10  
    11  The scheduler consults a repeated field named `Preferences` under `Placement`
    12  when it places tasks.  These "placement preferences" are be listed in
    13  decreasing order of precedence, and have higher precedence than the default
    14  scheduler logic.
    15  
    16  These placement preferences are be interpreted based on their types, but the
    17  initially supported "spread over" message tells the scheduler to spread tasks
    18  evenly between nodes which have each distinct value of the referenced node or
    19  engine label.
    20  
    21  ## Protobuf definitions
    22  
    23  In the `Placement` message under `TaskSpec`, we define a repeated field called
    24  `Preferences`.
    25  
    26  ```
    27  repeated PlacementPreference preferences = 2;
    28  ```
    29  
    30  `PlacementPreference` is a message that specifies how to act on a label.
    31  The initially supported preference would is "spread".
    32  
    33  ```
    34  message SpreadOver {
    35      string spread_descriptor = 1; // label descriptor, such as engine.labels.az
    36      // TODO: support node information beyond engine and node labels
    37  
    38      // TODO: in the future, add a map that provides weights for weighted
    39      // spreading.
    40  }
    41  
    42  message PlacementPreference {
    43      oneof Preference {
    44          SpreadOver spread = 1;
    45      }
    46  
    47      Preference pref = 1;
    48  }
    49  ```
    50  
    51  ## Behavior
    52  
    53  A simple use of this feature would be to spread tasks evenly between multiple
    54  availability zones. The way to do this would be to create an engine label on
    55  each node indicating its availability zone, and then create a
    56  `PlacementPreference` with type `SpreadOver` which references the engine label.
    57  The scheduler would prioritize balance between the availability zones, and if
    58  it ever has a choice between multiple nodes in the preferred availability zone
    59  (or a tie between AZs), it would choose the node based on its built-in logic.
    60  As of Docker 1.13, this logic will prefer to schedule a task on the node which
    61  has the fewest tasks associated with the particular service.
    62  
    63  A slightly more complicated use case involves hierarchical topology. Say there
    64  are two datacenters, which each have four rows, each row having 20 racks. To
    65  spread tasks evenly at each of these levels, there could be three `SpreadOver`
    66  messages in `Preferences`. The first would spread over datacenters, the second
    67  would spread over rows, and the third would spread over racks. This ensures that
    68  the highest precedence goes to spreading tasks between datacenters, but after
    69  that, tasks are evenly distributed between rows and then racks.
    70  
    71  Nodes that are missing the label used by `SpreadOver` will still receive task
    72  assignments. As a group, they will receive tasks in equal proportion to any of
    73  the other groups identified by a specific label value. In a sense, a missing
    74  label is the same as having the label with a null value attached to it. If the
    75  service should only run on nodes with the label being used for the `SpreadOver`
    76  preference, the preference should be combined with a constraint.
    77  
    78  ## Future enhancements
    79  
    80  - In addition to SpreadOver, we could add a PackInto with opposite behavior. It
    81    would try to locate tasks on nodes that share the same label value as other
    82    tasks, subject to constraints. By combining multiple SpreadOver and PackInto
    83    preferences, it would be possible to do things like spread over datacenters
    84    and then pack into racks within those datacenters.
    85  
    86  - Support weighted spreading, i.e. for situations where one datacenter has more
    87    servers than another. This could be done by adding a map to SpreadOver
    88    containing weights for each label value.
    89  
    90  - Support acting on items other than node labels and engine labels. For example,
    91    acting on node IDs to spread or pack over individual nodes, or on resource
    92    specifications to implement soft resource constraints.