github.com/kaisenlinux/docker.io@v0.0.0-20230510090727-ea55db55fac7/swarmkit/design/tla/SwarmKit.tla

github.com/kaisenlinux/docker.io@v0.0.0-20230510090727-ea55db55fac7/swarmkit/design/tla/SwarmKit.tla (about)

     1  This is a TLA+ model of SwarmKit. Even if you don't know TLA+, you should be able to
     2  get the general idea. This section gives a very brief overview of the syntax.
     3  
     4  Declare `x' to be something that changes as the system runs:
     5  
     6    VARIABLE x
     7  
     8  Define `Init' to be a state predicate (== means ``is defined to be''):
     9  
    10    Init ==
    11      x = 0
    12  
    13  `Init' is true for states in which `x = 0'. This can be used to define
    14  the possible initial states of the system. For example, the state
    15  [ x |-> 0, y |-> 2, ... ] satisfies this.
    16  
    17  Define `Next' to be an action:
    18  
    19    Next ==
    20      /\ x' \in Nat
    21      /\ x' > x
    22  
    23  An action takes a pair of states, representing an atomic step of the system.
    24  Unprimed expressions (e.g. `x') refer to the old state, and primed ones to
    25  the new state. This example says that a step is a `Next' step iff the new
    26  value of `x' is a natural number and greater than the previous value.
    27  For example, [ x |-> 3, ... ] -> [x |-> 10, ... ] is a `Next' step.
    28  
    29  /\ is logical ``and''. This example uses TLA's ``bulleted-list'' syntax, which makes
    30  writing these easier. It is indentation-sensitive. TLA also has \/ lists (``or'').
    31  
    32  See `.http://lamport.azurewebsites.net/tla/summary.pdf.' for a more complete summary
    33  of the syntax.
    34  
    35  This specification can be read as documentation, but it can also be executed by the TLC
    36  model checker. See the model checking section below for details about that.
    37  
    38  The rest of the document is organised as follows:
    39  
    40  1. Parameters to the model
    41  2. Types and definitions
    42  3. How to run the model checker
    43  4. Actions performed by the user
    44  5. Actions performed by the components of SwarmKit
    45  6. The complete system
    46  7. Properties of the system
    47  
    48  -------------------------------- MODULE SwarmKit --------------------------------
    49  
    50  (* Import some libraries we use.
    51     Common SwarmKit types are defined in Types.tla. You should probably read that before continuing. *)
    52  EXTENDS Integers, TLC, FiniteSets,  \* From the TLA+ standard library
    53          Types,                      \* SwarmKit types
    54          Tasks,                      \* The `tasks' variable
    55          WorkerSpec,                 \* High-level spec for worker nodes
    56          EventCounter                \* Event limiting, for modelling purposes
    57  
    58  (* The maximum number of terminated tasks to keep for each slot. *)
    59  CONSTANT maxTerminated
    60  ASSUME maxTerminated \in Nat
    61  
    62  (* In the model, we share taskIDs (see ModelTaskId), which means that
    63     we can cover most behaviours with only enough task IDs
    64     for one running task and maxTerminated finished ones. *)
    65  ASSUME Cardinality(TaskId) >= 1 + maxTerminated
    66  
    67  -------------------------------------------------------------------------------
    68  \* Services
    69  
    70  VARIABLE services       \* A map of currently-allocated services, indexed by ServiceId
    71  
    72  (* A replicated service is one that specifies some number of replicas it wants. *)
    73  IsReplicated(sid) ==
    74    services[sid].replicas \in Nat
    75  
    76  (* A global service is one that wants one task running on each node. *)
    77  IsGlobal(sid) ==
    78    services[sid].replicas = global
    79  
    80  (* TasksOf(sid) is the set of tasks for service `sid'. *)
    81  TasksOf(sid) ==
    82    { t \in tasks : t.service = sid }
    83  
    84  (* All tasks of service `sid' in `vslot'. *)
    85  TasksOfVSlot(sid, vslot) ==
    86    { t \in TasksOf(sid) : VSlot(t) = vslot }
    87  
    88  (* All vslots of service `sid'. *)
    89  VSlotsOf(sid) ==
    90    { VSlot(t) : t \in TasksOf(sid) }
    91  
    92  -------------------------------------------------------------------------------
    93  \* Types
    94  
    95  (* The expected type of each variable. TLA+ is an untyped language, but the model checker
    96     can check that TypeOK is true for every reachable state. *)
    97  TypeOK ==
    98    \* `services' is a mapping from service IDs to ServiceSpecs:
    99    /\ DOMAIN services \subseteq ServiceId
   100    /\ services \in [ DOMAIN services -> ServiceSpec ]
   101    /\ TasksTypeOK    \* Defined in Types.tla
   102    /\ WorkerTypeOK   \* Defined in WorkerSpec.tla
   103  
   104  -------------------------------------------------------------------------------
   105  (*
   106  `^ \textbf{Model checking} ^'
   107  
   108     You can test this specification using the TLC model checker.
   109     This section describes how to do that. If you don't want to run TLC,
   110     you can skip this section.
   111  
   112     To use TLC, load this specification file in the TLA+ toolbox (``Open Spec'')
   113     and create a new model using the menu.
   114  
   115     You will be prompted to enter values for the various CONSTANTS.
   116     A suitable set of initial values is:
   117  
   118        `.
   119        Node          <- [ model value ] {n1}
   120        ServiceId     <- [ model value ] {s1}
   121        TaskId        <- [ model value ] {t1, t2}
   122        maxReplicas   <- 1
   123        maxTerminated <- 1
   124        .'
   125  
   126     For the [ model value ] ones, select `Set of model values'.
   127  
   128     This says that we have one node, `n1', at most one service, and at most
   129     two tasks per vslot. TLC can explore all possible behaviours of this system
   130     in a couple of seconds on my laptop.
   131  
   132     You should also specify some things to check (under ``What to check?''):
   133  
   134     - Add `TypeOK' and `Inv' under ``Invariants''
   135     - Add `TransitionsOK' and `EventuallyAsDesired' under ``Properties''
   136  
   137     Running the model should report ``No errors''.
   138  
   139     If the model fails, TLC will show you an example sequence of actions that lead to
   140     the failure and you can inspect the state at each step. You can try this out by
   141     commenting out any important-looking condition in the model (e.g. the requirement
   142     in UpdateService that you can't change the mode of an existing service).
   143  
   144     Although the above model is very small, it should detect most errors that you might
   145     accidentally introduce when modifying the specification. Increasing the number of nodes,
   146     services, replicas or terminated tasks will check more behaviours of the system,
   147     but will be MUCH slower.
   148  
   149     The rest of this section describes techniques to make model checking faster by reducing
   150     the number of states that must be considered in various ways. Feel free to skip it.
   151  
   152  `^ \textbf{Symmetry sets} ^'
   153  
   154     You should configure any model sets (e.g. `TaskId') as `symmetry sets'.
   155     For example, if you have a model with two nodes {n1, n2} then this tells TLC that
   156     two states which are the same except that n1 and n2 are swapped are equivalent
   157     and it only needs to continue exploring from one of them.
   158     TLC will warn that checking temporal properties may not work correctly,
   159     but it's much faster and I haven't had any problems with it.
   160  
   161  `^ \textbf{Limiting the maximum number of setbacks to consider} ^'
   162  
   163     Another way to speed things up is to reduce the number of failures that TLC must consider.
   164     By default, it checks every possible combination of failures at every point, which
   165     is very expensive.
   166     In the `Advanced Options' panel of the model, add a ``Definition Override'' of e.g.
   167     `maxEvents = 2'. Actions that represent unnecessary extra work (such as the user
   168     changing the configuration or a worker node going down) are tagged with `CountEvent'.
   169     Any run of the system cannot have more than `maxEvents' such events.
   170  
   171     See `EventCounter.tla' for details.
   172  
   173  `^ \textbf{Preventing certain failures} ^'
   174  
   175     If you're not interested in some actions then you can block them. For example,
   176     adding these two constraints in the ``Action Constraint'' box of the
   177     ``Advanced Options'' tab tells TLC not to consider workers going down or
   178     workers rejecting tasks as possible actions:
   179  
   180     /\ ~WorkerDown
   181     /\ ~RejectTask
   182  *)
   183  
   184  (*
   185  `^ \textbf{Combining task states} ^'
   186  
   187     A finished task can be either in the `complete' or `failed' state, depending on
   188     its exit status. If we have 4 finished tasks, that's 16 different states. For
   189     modelling, we might not care about exit codes and we can treat this as a single
   190     state with another definition override:
   191  
   192     `.failed <- complete.'
   193  
   194     In a similar way, we can combine { assigned, accepted, preparing, ready } into a single
   195     state:
   196  
   197     `.accepted <- assigned
   198       preparing <- assigned
   199       ready <- assigned.'
   200  *)
   201  
   202  ---------------------------- MODULE User  --------------------------------------------
   203  \* Actions performed by users
   204  
   205  (* Create a new service with any ServiceSpec.
   206  
   207     This says that a single atomic step of the system from an old state
   208     to a new one is a CreateService step iff `tasks', `nodes' and `nEvents' don't change
   209     and the new value of `services' is the same as before except that some
   210     service ID that wasn't used in the old state is now mapped to some
   211     ServiceSpec.
   212  
   213     Note: A \ B means { x \in A : x \notin B } --
   214           i.e. the set A with all elements in B removed.
   215     *)
   216  CreateService ==
   217    /\ UNCHANGED << tasks, nodes, nEvents >>
   218    /\ \E sid \in ServiceId \ DOMAIN services,     \* `sid' is an unused ServiceId
   219         spec \in ServiceSpec :                    \* `spec' is any ServiceSpec
   220            /\ spec.remove = FALSE                 \* (not flagged for removal)
   221            /\ services' = services @@ sid :> spec \* Add `sid |-> spec' to `services'
   222  
   223  (* Update an existing service's spec. *)
   224  UpdateService ==
   225    /\ UNCHANGED << tasks, nodes >>
   226    /\ CountEvent \* Flag as an event for model-checking purposes
   227    /\ \E sid     \in DOMAIN services,   \* `sid' is an existing ServiceId
   228          newSpec \in ServiceSpec :      \* `newSpec' is any `ServiceSpec'
   229         /\ services[sid].remove = FALSE \* We weren't trying to remove sid
   230         /\ newSpec.remove = FALSE       \* and we still aren't.
   231         \* You can't change a service's mode:
   232         /\ (services[sid].replicas = global) <=> (newSpec.replicas = global)
   233         /\ services' = [ services EXCEPT ![sid] = newSpec ]
   234  
   235  (* The user removes a service.
   236  
   237     Note: Currently, SwarmKit deletes the service from its records immediately.
   238     However, this isn't right because we need to wait for service-level resources
   239     such as Virtual IPs to be freed.
   240     Here we model the proposed fix, in which we just flag the service for removal. *)
   241  RemoveService ==
   242    /\ UNCHANGED << nodes >>
   243    /\ CountEvent
   244    /\ \E sid \in DOMAIN services : \* sid is some existing service
   245         \* Flag service for removal:
   246         /\ services' = [services EXCEPT ![sid].remove = TRUE]
   247         \* Flag every task of the service for removal:
   248         /\ UpdateTasks([ t \in TasksOf(sid) |->
   249                            [t EXCEPT !.desired_state = remove] ])
   250  
   251  (* A user action is one of these. *)
   252  User ==
   253    \/ CreateService
   254    \/ UpdateService
   255    \/ RemoveService
   256  
   257  =============================================================================
   258  
   259  ---------------------------- MODULE Orchestrator ----------------------------
   260  
   261  \* Actions performed the orchestrator
   262  
   263  \* Note: This is by far the most complicated component in the model.
   264  \* You might want to read this section last...
   265  
   266  (* The set of tasks for service `sid' that should be considered as active.
   267     This is any task that is running or on its way to running. *)
   268  RunnableTasks(sid) ==
   269    { t \in TasksOf(sid) : Runnable(t) }
   270  
   271  (* Candidates for shutting down when we have too many. We don't want to count tasks that are shutting down
   272     towards the total count when deciding whether we need to kill anything. *)
   273  RunnableWantedTasks(sid) ==
   274    { t \in RunnableTasks(sid) : t.desired_state \preceq running  }
   275  
   276  (* The set of possible new vslots for `sid'. *)
   277  UnusedVSlot(sid) ==
   278    IF IsReplicated(sid) THEN Slot \ VSlotsOf(sid)
   279                         ELSE Node \ VSlotsOf(sid)
   280  
   281  (* The set of possible IDs for a new task in a vslot.
   282  
   283     The complexity here is just a side-effect of the modelling (where we need to
   284     share and reuse task IDs for performance).
   285     In the real system, choosing an unused ID is easy. *)
   286  UnusedId(sid, vslot) ==
   287    LET swarmTaskIds == { t.id : t \in TasksOfVSlot(sid, vslot) }
   288    IN  TaskId \ swarmTaskIds
   289  
   290  (* Create a new task/slot if the number of runnable tasks is less than the number requested. *)
   291  CreateSlot ==
   292    /\ UNCHANGED << services, nodes, nEvents >>
   293    /\ \E sid \in DOMAIN services :          \* `sid' is an existing service
   294       /\ ~services[sid].remove              \* that we're not trying to remove
   295       (* For replicated tasks, only create as many slots as we need.
   296          For global tasks, we want all possible vslots (nodes). *)
   297       /\ IsReplicated(sid) =>
   298            services[sid].replicas > Cardinality(VSlotsOf(sid))  \* Desired > actual
   299       /\ \E slot \in UnusedVSlot(sid) :
   300          \E id   \in UnusedId(sid, slot) :
   301             tasks' = tasks \union { NewTask(sid, slot, id, running) }
   302  
   303  (* Add a task if a slot exists, contains no runnable tasks, and we weren't trying to remove it.
   304     Note: if we are trying to remove it, the slot will eventually disappear and CreateSlot will
   305     then make a new one if we later need it again.
   306  
   307     Currently in SwarmKit, slots do not actually exist as objects in the store.
   308     Instead, we just infer that a slot exists because there exists a task with that slot ID.
   309     This has the odd effect that if `maxTerminated = 0' then we may create new slots rather than reusing
   310     existing ones, depending on exactly when the reaper runs.
   311     *)
   312  ReplaceTask ==
   313    /\ UNCHANGED << services, nodes, nEvents >>
   314    /\ \E sid  \in DOMAIN services :
   315       \E slot \in VSlotsOf(sid) :
   316       /\ \A task \in TasksOfVSlot(sid, slot) :    \* If all tasks in `slot' are
   317             ~Runnable(task)                       \* dead (not runnable) and
   318       /\ \E task \in TasksOfVSlot(sid, slot) :    \* there is some task that
   319             task.desired_state # remove           \* we're not trying to remove,
   320       /\ \E id \in UnusedId(sid, slot) :          \* then create a replacement task:
   321          tasks' = tasks \union { NewTask(sid, slot, id, running) }
   322  
   323  (* If we have more replicas than the spec asks for, remove one of them. *)
   324  RequestRemoval ==
   325    /\ UNCHANGED << services, nodes, nEvents >>
   326    /\ \E sid \in DOMAIN services :
   327         LET current == RunnableWantedTasks(sid)
   328         IN \* Note: `current' excludes tasks we're already trying to kill
   329         /\ IsReplicated(sid)
   330         /\ services[sid].replicas < Cardinality(current)   \* We have too many replicas
   331         /\ \E slot \in { t.slot : t \in current } :        \* Choose an allocated slot
   332              \* Mark all tasks for that slot for removal:
   333              UpdateTasks( [ t \in TasksOfVSlot(sid, slot) |->
   334                              [t EXCEPT !.desired_state = remove] ] )
   335  
   336  (* Mark a terminated task for removal if we already have `maxTerminated' terminated tasks for this slot. *)
   337  CleanupTerminated ==
   338    /\ UNCHANGED << services, nodes, nEvents >>
   339    /\ \E sid  \in DOMAIN services :
   340       \E slot \in VSlotsOf(sid) :
   341       LET termTasksInSlot == { t \in TasksOfVSlot(sid, slot) :
   342                                State(t) \in { complete, shutdown, failed, rejected } }
   343       IN
   344       /\ Cardinality(termTasksInSlot) > maxTerminated    \* Too many tasks for slot
   345       /\ \E t \in termTasksInSlot :                      \* Pick a victim to remove
   346          UpdateTasks(t :> [t EXCEPT !.desired_state = remove])
   347  
   348  (* We don't model the updater explicitly, but we allow any task to be restarted (perhaps with
   349     a different image) at any time, which should cover the behaviours of the restart supervisor.
   350  
   351     TODO: SwarmKit also allows ``start-first'' mode updates where we first get the new task to
   352     `running' and then mark the old task for shutdown. Add this to the model. *)
   353  RestartTask ==
   354    /\ UNCHANGED << services, nodes >>
   355    /\ CountEvent
   356    /\ \E oldT  \in tasks :
   357       \E newId \in UnusedId(oldT.service, VSlot(oldT)) :
   358          /\ Runnable(oldT)                           \* Victim must be runnable
   359          /\ oldT.desired_state \prec shutdown        \* and we're not trying to kill it
   360          \* Create the new task in the `ready' state (see ReleaseReady below):
   361          /\ LET replacement == NewTask(oldT.service, VSlot(oldT), newId, ready)
   362             IN  tasks' =
   363                  (tasks \ {oldT}) \union {
   364                    [oldT EXCEPT !.desired_state = shutdown],
   365                    replacement
   366                  }
   367  
   368  (* A task is set to wait at `ready' and the previous task for that slot has now finished.
   369     Allow it to proceed to `running'. *)
   370  ReleaseReady ==
   371    /\ UNCHANGED << services, nodes, nEvents >>
   372    /\ \E t \in tasks :
   373         /\ t.desired_state = ready         \* (and not e.g. `remove')
   374         /\ State(t) = ready
   375         /\ \A other \in TasksOfVSlot(t.service, VSlot(t)) \ {t} :
   376               ~Runnable(other)             \* All other tasks have finished
   377         /\ UpdateTasks(t :> [t EXCEPT !.desired_state = running])
   378  
   379  (* The user asked to remove a service, and now all its tasks have been cleaned up. *)
   380  CleanupService ==
   381    /\ UNCHANGED << tasks, nodes, nEvents >>
   382    /\ \E sid \in DOMAIN services :
   383       /\ services[sid].remove = TRUE
   384       /\ TasksOf(sid) = {}
   385       /\ services' = [ i \in DOMAIN services \ {sid} |-> services[i] ]
   386  
   387  (* Tasks that the orchestrator must always do eventually if it can: *)
   388  OrchestratorProgress ==
   389    \/ CreateSlot
   390    \/ ReplaceTask
   391    \/ RequestRemoval
   392    \/ CleanupTerminated
   393    \/ ReleaseReady
   394    \/ CleanupService
   395  
   396  (* All actions that the orchestrator can perform *)
   397  Orchestrator ==
   398    \/ OrchestratorProgress
   399    \/ RestartTask
   400  
   401  =============================================================================
   402  
   403  ---------------------------- MODULE Allocator -------------------------------
   404  \*  Actions performed the allocator
   405  
   406  (* Pick a `new' task and move it to `pending'.
   407  
   408     The spec says the allocator will ``allocate resources such as network attachments
   409     which are necessary for the tasks to run''. However, we don't model any resources here. *)
   410  AllocateTask ==
   411    /\ UNCHANGED << services, nodes, nEvents >>
   412    /\ \E t \in tasks :
   413       /\ State(t) = new
   414       /\ UpdateTasks(t :> [t EXCEPT !.status.state = pending])
   415  
   416  AllocatorProgress ==
   417    \/ AllocateTask
   418  
   419  Allocator ==
   420    \/ AllocatorProgress
   421  
   422  =============================================================================
   423  
   424  ---------------------------- MODULE Scheduler -------------------------------
   425  
   426  \*  Actions performed by the scheduler
   427  
   428  (* The scheduler assigns a node to a `pending' task and moves it to `assigned'
   429     once sufficient resources are available (we don't model resources here). *)
   430  Scheduler ==
   431    /\ UNCHANGED << services, nodes, nEvents >>
   432    /\ \E t \in tasks :
   433       /\ State(t) = pending
   434       /\ LET candidateNodes == IF t.node = unassigned
   435                                  THEN Node  \* (all nodes)
   436                                  ELSE { t.node }
   437          IN
   438          \E node \in candidateNodes :
   439             UpdateTasks(t :> [t EXCEPT !.status.state = assigned,
   440                                        !.node = node ])
   441  
   442  =============================================================================
   443  
   444  ---------------------------- MODULE Reaper ----------------------------------
   445  
   446  \*  Actions performed by the reaper
   447  
   448  (* Forget about tasks in remove or orphan states.
   449  
   450     Orphaned tasks belong to nodes that we are assuming are lost forever (or have crashed
   451     and will come up with nothing running, which is an equally fine outcome). *)
   452  Reaper ==
   453    /\ UNCHANGED << services, nodes, nEvents >>
   454    /\ \E t \in tasks :
   455        /\ \/ /\ t.desired_state = remove
   456              /\ (State(t) \prec assigned \/ ~Runnable(t)) \* Not owned by agent
   457           \/ State(t) = orphaned
   458        /\ tasks' = tasks \ {t}
   459  
   460  =============================================================================
   461  
   462  \*  The complete system
   463  
   464  \* Import definitions from the various modules
   465  INSTANCE User
   466  INSTANCE Orchestrator
   467  INSTANCE Allocator
   468  INSTANCE Scheduler
   469  INSTANCE Reaper
   470  
   471  \* All the variables
   472  vars == << tasks, services, nodes, nEvents >>
   473  
   474  \* Initially there are no tasks and no services, and all nodes are up.
   475  Init ==
   476    /\ tasks = {}
   477    /\ services = << >>
   478    /\ nodes = [ n \in Node |-> nodeUp ]
   479    /\ InitEvents
   480  
   481  (* WorkerSpec doesn't mention `services'. To combine it with this spec, we need to say
   482     that every action of the agent leaves `services' unchanged. *)
   483  AgentReal ==
   484    Agent /\ UNCHANGED services
   485  
   486  (* Unfortunately, `AgentReal' causes TLC to report all problems of the agent
   487     as simply `AgentReal' steps, which isn't very helpful. We can get better
   488     diagnostics by expanding it, like this: *)
   489  AgentTLC ==
   490    \/ (ProgressTask     /\ UNCHANGED services)
   491    \/ (ShutdownComplete /\ UNCHANGED services)
   492    \/ (OrphanTasks      /\ UNCHANGED services)
   493    \/ (WorkerUp         /\ UNCHANGED services)
   494    \/ (RejectTask       /\ UNCHANGED services)
   495    \/ (ContainerExit    /\ UNCHANGED services)
   496    \/ (WorkerDown       /\ UNCHANGED services)
   497  
   498  (* To avoid the risk of `AgentTLC' getting out of sync,
   499     TLAPS can check that the definitions are equivalent. *)
   500  THEOREM AgentTLC = AgentReal
   501  BY DEF AgentTLC, AgentReal, Agent, AgentProgress
   502  
   503  (* A next step is one in which any of these sub-components takes a step: *)
   504  Next ==
   505    \/ User
   506    \/ Orchestrator
   507    \/ Allocator
   508    \/ Scheduler
   509    \/ AgentTLC
   510    \/ Reaper
   511    \* For model checking: don't report deadlock if we're limiting events
   512    \/ (nEvents = maxEvents /\ UNCHANGED vars)
   513  
   514  (* This is a ``temporal formula''. It takes a sequence of states representing the
   515     changing state of the world and evaluates to TRUE if that sequences of states is
   516     a possible behaviour of SwarmKit. *)
   517  Spec ==
   518    \* The first state in the behaviour must satisfy Init:
   519    /\ Init
   520    \* All consecutive pairs of states must satisfy Next or leave `vars' unchanged:
   521    /\ [][Next]_vars
   522    (* Some actions are required to happen eventually. For example, a behaviour in
   523       which SwarmKit stops doing anything forever, even though it could advance some task
   524       from the `new' state, isn't a valid behaviour of the system.
   525       This property is called ``weak fairness''. *)
   526    /\ WF_vars(OrchestratorProgress)
   527    /\ WF_vars(AllocatorProgress)
   528    /\ WF_vars(Scheduler)
   529    /\ WF_vars(AgentProgress /\ UNCHANGED services)
   530    /\ WF_vars(Reaper)
   531    /\ WF_vars(WorkerUp /\ UNCHANGED services)
   532       (* We don't require fairness of:
   533          - User (we don't control them),
   534          - RestartTask (services aren't required to be updated),
   535          - RejectTask (tasks aren't required to be rejected),
   536          - ContainerExit (we don't specify image behaviour) or
   537          - WorkerDown (workers aren't required to fail). *)
   538  
   539  -------------------------------------------------------------------------------
   540  \* Properties to verify
   541  
   542  (* These are properties that should follow automatically if the system behaves as
   543     described by `Spec' in the previous section. *)
   544  
   545  \* A state invariant (things that should be true in every state).
   546  Inv ==
   547    \A t \in tasks :
   548      (* Every task has a service:
   549  
   550         TODO: The spec says: ``In some cases, there are tasks that exist independent of any service.
   551               These do not have a value set in service_id.''. Add an example of one. *)
   552      /\ t.service \in DOMAIN services
   553      \* Tasks have nodes once they reach `assigned', except maybe if rejected:
   554      /\ assigned \preceq State(t) => t.node \in Node \/ State(t) = rejected
   555      \* `remove' is only used as a desired state, not an actual one:
   556      /\ State(t) # remove
   557      \* Task IDs are unique
   558      /\ \A t2 \in tasks : Id(t) = Id(t2) => t = t2
   559  
   560  (* The state of task `i' in `S', or `null' if it doesn't exist *)
   561  Get(S, i) ==
   562    LET cand == { x \in S : Id(x) = i }
   563    IN  IF cand = {} THEN null
   564                     ELSE State(CHOOSE x \in cand : TRUE)
   565  
   566  (* An action in which all transitions were valid. *)
   567  StepTransitionsOK ==
   568    LET permitted == { << x, x >> : x \in TaskState } \union  \* No change is always OK
   569      CASE Orchestrator -> Transitions.orchestrator
   570        [] Allocator    -> Transitions.allocator
   571        [] Scheduler    -> Transitions.scheduler
   572        [] Agent        -> Transitions.agent
   573        [] Reaper       -> Transitions.reaper
   574        [] OTHER        -> {}
   575      oldIds == IdSet(tasks)
   576      newIds == IdSet(tasks')
   577    IN
   578    \A id \in newIds \union oldIds :
   579       << Get(tasks, id), Get(tasks', id) >> \in permitted
   580  
   581  (* Some of the expressions below are ``temporal formulas''. Unlike state expressions and actions,
   582     these look at a complete behaviour (sequence of states). Summary of notation:
   583  
   584     [] means ``always''. e.g. []x=3 means that `x = 3' in all states.
   585  
   586     <> means ``eventually''. e.g. <>x=3 means that `x = 3' in some state.
   587  
   588     `x=3' on its own means that `x=3' in the initial state.
   589  *)
   590  
   591  \* A temporal formula that checks every step satisfies StepTransitionsOK (or `vars' is unchanged)
   592  TransitionsOK ==
   593    [][StepTransitionsOK]_vars
   594  
   595  (* Every service has the right number of running tasks (the system is in the desired state). *)
   596  InDesiredState ==
   597    \A sid \in DOMAIN services :
   598      \* We're not trying to remove the service:
   599      /\ ~services[sid].remove
   600      \* The service has the correct set of running replicas:
   601      /\ LET runningTasks  == { t \in TasksOf(sid) : State(t) = running }
   602             nRunning      == Cardinality(runningTasks)
   603         IN
   604         CASE IsReplicated(sid) ->
   605                /\ nRunning = services[sid].replicas
   606           [] IsGlobal(sid) ->
   607                \* We have as many tasks as nodes:
   608                /\ nRunning = Cardinality(Node)
   609                \* We have a task for every node:
   610                /\ { t.node : t \in runningTasks } = Node
   611      \* The service does not have too many terminated tasks
   612      /\ \A slot \in VSlotsOf(sid) :
   613         LET terminated == { t \in TasksOfVSlot(sid, slot) : ~Runnable(t) }
   614         IN  Cardinality(terminated) <= maxTerminated
   615  
   616  (* The main property we want to check.
   617  
   618     []<> means ``always eventually'' (``infinitely-often'')
   619  
   620     <>[] means ``eventually always'' (always true after some point)
   621  
   622     This temporal formula says that if we only experience a finite number of
   623     problems then the system will eventually settle on InDesiredState.
   624  *)
   625  EventuallyAsDesired ==
   626    \/ []<> <<User>>_vars               \* Either the user keeps changing the configuration,
   627    \/ []<> <<RestartTask>>_vars        \* or restarting/updating tasks,
   628    \/ []<> <<WorkerDown>>_vars         \* or workers keep failing,
   629    \/ []<> <<RejectTask>>_vars         \* or workers keep rejecting tasks,
   630    \/ []<> <<ContainerExit>>_vars      \* or the containers keep exiting,
   631    \/ <>[] InDesiredState              \* or we eventually get to the desired state and stay there.
   632  
   633  =============================================================================