github.com/juju/juju@v0.0.0-20240430160146-1752b71fcf00/doc/lifecycles.md (about)

     1  Lifecycles
     2  ==========
     3  
     4  In juju, certain fundamental state entities have "lifecycles". These entities
     5  are:
     6  
     7    * Machines
     8    * Units
     9    * Applications
    10    * Relations
    11  
    12  ...and there are only 3 possible states for the above things:
    13  
    14    * Alive (An entity is Alive when it is first created.)
    15    * Dying (An entity becomes Dying when the user indicates that it should be
    16      destroyed, and remains so while there are impediments to its removal.)
    17    * Dead (an entity becomes Dead when there are no further impediments to
    18      its removal; at this point it may be removed from the database at any time.
    19      Some entities may become Dead and are removed as a single operation, and
    20      are hence never directly observed to be "Dead", but should still be so
    21      considered.)
    22  
    23  There are two fundamental truths in this system:
    24  
    25    * All such things start existence Alive.
    26    * No such thing can ever change to an earlier state.
    27  
    28  Beyond the above rules, lifecycle shifts occur at different times for different
    29  kinds of entities.
    30  
    31  Machines
    32  --------
    33  
    34    * Like everything else, a machine starts out Alive. `juju bootstrap` aside,
    35      the user interface does not allow for direct creation of machines, but
    36      `juju deploy` and `juju add-unit` may create machines as a consequence of
    37      unit creation.
    38    * If a machine has the JobManageModel job, it cannot become Dying or Dead.
    39      Other jobs do not affect the lifecycle directly.
    40    * If a machine has the JobHostUnits job, principal units can be assigned to it
    41      while it is Alive.
    42    * While principal units are assigned to a machine, its lifecycle cannot change
    43      and `juju remove-machine` will fail.
    44    * When no principal units are assigned, `juju remove-machine` will set the
    45      machine to Dying. (Future plans: allow a machine to become Dying when it
    46      has principal units, so long as they are not Alive. For now it's extra
    47      complexity with little direct benefit.)
    48    * When a machine has containers, `juju remove-machine` will fail, unless force
    49      is used.  However `juju destroy-controller` or `juju destroy-model` allows a
    50      machine to move to dying with containers.  
    51    * Once a machine has been set to Dying, the corresponding Machine Agent (MA)
    52      is responsible for setting it to Dead. A dying machine cannot transition to 
    53      dead if there are containers. (Future plans: when Dying units are
    54      assigned, wait for them to become Dead and remove them completely before
    55      making the machine Dead; not an issue now because the machine can't yet
    56      become Dying with units assigned.)
    57    * Once a machine has been set to Dead, the agent for some other machine (with
    58      JobManageModel) will release the underlying instance back to the provider
    59      and remove the machine entity from state. (Future uncertainty: should the
    60      provisioner provision an instance for a Dying machine? At the moment, no,
    61      because a Dying machine can't have any units in the first place; in the
    62      future, er, maybe, because those Dying units may be attached to persistent
    63      storage and should thus be allowed to continue to shut down cleanly as they
    64      would usually do. Maybe.)
    65  
    66  Units
    67  -----
    68  
    69    * A principal unit can be created directly with `juju deploy` or
    70      `juju add-unit`.
    71    * While a principal unit is Alive, it can be assigned to a machine.
    72    * While a principal unit is Alive, it can enter the scopes of Alive
    73      relations, which may cause the creation of subordinate units; so,
    74      indirectly, `juju integrate` can also cause the creation of units.
    75    * A unit can become Dying at any time, but may not become Dead while any unit
    76      subordinate to it exists, or while the unit is in scope for any relation.
    77    * A principal unit can become Dying in one of two ways:
    78        * `juju remove-unit` (This doesn't work on subordinates; see below.)
    79        * `juju remove-application` (This does work on subordinates, but happens
    80          indirectly in either case: the Unit Agents (UAs) for each unit of an
    81          application set their corresponding units to Dying when they detect their
    82          application Dying; this is because we try to assume 100k-scale and we can't
    83          use mgo/txn to do a bulk update of 100k units: that makes for a txn
    84          with at least 100k operations, and that's just crazy.)
    85    * A subordinate must also become Dying when either:
    86        * its principal becomes Dying, via `juju remove-unit`; or
    87        * the last Alive relation between its application and its principal's
    88          application is no longer Alive. This may come about via `juju remove-relation`.
    89    * When any unit is Dying, its UA is responsible for removing impediments to
    90      the unit becoming Dead, and then making it so. To do so, the UA must:
    91        * Depart from all its relations in an orderly fashion.
    92        * Wait for all its subordinates to become Dead, and remove them from state.
    93        * Set its unit to Dead.
    94    * As just noted, when a subordinate unit is Dead, it is removed from state by
    95      its principal's UA; the relationship is the same as that of a principal unit
    96      to its assigned machine agent, and of a machine to the JobManageModel
    97      machine agent.
    98  
    99  Applications
   100  --------
   101  
   102    * Applications are created with `juju deploy`. Applications with duplicate names
   103      are not allowed (units and machine with duplicate names are not possible:
   104      their identifiers are assigned by juju).
   105    * Unlike units and machines, applications have no corresponding agent.
   106    * In addition, applications become Dead and are removed from the database in a
   107      single atomic operation.
   108    * When an application is Alive, units may be added to it, and relations can be
   109      added using the application's endpoints.
   110    * An applications can be destroyed at any time, via `juju remove-application`.
   111      This causes all the units to become Dying, as discussed above, and will also
   112      cause all relations in which the application is participating to become Dying
   113      or be removed.
   114    * If a removed application has no units, and all its relations are eligible
   115      for immediate removal, then the application will also be removed immediately
   116      rather than being set to Dying.
   117    * If no associated relations exist, the application is removed by the MA which
   118      removes the last unit of that application from state.
   119    * If no units of the application remain, but its relations still exist, the
   120      responsibility for removing the application falls to the last UA to leave scope
   121      for that relation. (Yes, this is a UA for a unit of a totally different
   122      application.)
   123  
   124  Relations
   125  ---------
   126  
   127    * A relation is created with `juju integrate`. No two relations with the
   128      same canonical name can exist. (The canonical relation name form is
   129      "<requirer-endpoint> <provider-endpoint>", where each endpoint takes the
   130      form "<application-name>:<charm-relation-name>".)
   131        * Thanks to convention, the above is not strictly true: it is possible
   132          for a subordinate charm to require a container-scoped "juju-info"
   133          relation. These restrictions mean that the name can never cause
   134          actual ambiguity; nonetheless, support should be phased out smoothly
   135          (see lp:1100076).
   136    * A relation, like an application, has no corresponding agent; and becomes Dead
   137      and is removed from the database in a single operation.
   138    * Similarly to an application, a relation cannot be created while an identical
   139      relation exists in state (in which identity is determined by equality of
   140      canonical relation name -- a sequence of endpoint pairs sorted by role).
   141    * While a relation is Alive, units of applications in that relation can enter its
   142      scope; that is, the UAs for those units can signal to the system that they
   143      are participating in the relation.
   144    * A relation can be destroyed with either `juju remove-relation` or
   145      `juju remove-application`.
   146    * When a relation is destroyed with no units in scope, it will immediately
   147      become Dead and be removed from state, rather than being set to Dying.
   148    * When a relation becomes Dying, the UAs of units that have entered its scope
   149      are responsible for cleanly departing the relation by running hooks and then
   150      leaving relation scope (signalling that they are no longer participating).
   151    * When the last unit leaves the scope of a Dying relation, it must remove the
   152      relation from state.
   153    * As noted above, the Dying relation may be the only thing keeping a Dying
   154      application (different to that of the acting UA) from removal; so, relation
   155      removal may also imply application removal.
   156  
   157  References
   158  ----------
   159  
   160  OK, that was a bit of a hail of bullets, and the motivations for the above are
   161  perhaps not always clear. To consider it from another angle:
   162  
   163    * Subordinate units reference principal units.
   164    * Principal units reference machines.
   165    * All units reference their applications.
   166    * All units reference the relations whose scopes they have joined.
   167    * All relations reference the applications they are part of.
   168  
   169  In every case above, where X references Y, the life state of an X may be
   170  sufficient to prevent a change in the life state of a Y; and, conversely, a
   171  life change in an X may be sufficient to cause a life change in a Y. (In only
   172  one case does the reverse hold -- that is, setting an application or relation to
   173  Dying will cause appropriate units' agents to individually set their units to
   174  Dying -- and this is just an implementation detail.)
   175  
   176  The following scrawl may help you to visualize the references in play:
   177  
   178          +-----------+       +---------+
   179      +-->| principal |------>| machine |
   180      |   +-----------+       +---------+
   181      |      |     |
   182      |      |     +--------------+
   183      |      |                    |
   184      |      V                    V
   185      |   +----------+       +-------------+
   186      |   | relation |------>| application |
   187      |   +----------+       +-------------+
   188      |      A                    A
   189      |      |                    |
   190      |      |     +--------------+
   191      |      |     |
   192      |   +-------------+
   193      +---| subordinate |
   194          +-------------+
   195  
   196  ...but is important to remember that it's only one view of the relationships
   197  involved, and that the user-centric view is quite different; from a user's
   198  perspective the influences appear to travel in the opposite direction:
   199  
   200    * (destroying a machine "would" destroy its principals but that's disallowed)
   201    * destroying a principal destroys all its subordinates
   202    * (destroying a subordinate directly is impossible)
   203    * destroying a application destroys all its units and relations
   204    * destroying a container relation destroys all subordinates in the relation
   205    * (destroying a global relation destroys nothing else)
   206  
   207  ...and it takes a combination of these viewpoints to understand the detailed
   208  interactions laid out above.
   209  
   210  Agents
   211  ------
   212  
   213  It may also be instructive to consider the responsibilities of the unit and
   214  machine agents. The unit agent is responsible for:
   215  
   216    * detecting Alive relations incorporating its application and entering their
   217      scopes (if a principal, this may involve creating subordinates).
   218    * detecting Dying relations whose scope it has entered and leaving their
   219      scope (this involves removing any relations or applications that thereby
   220      become unreferenced).
   221    * detecting undeployed Alive subordinates and deploying them.
   222    * detecting undeployed non-Alive subordinates and removing them (this raises
   223      similar questions to those alluded to above re Dying units on Dying machines:
   224      but, without persistent storage, there's no point deploying a Dying unit just
   225      to wait for its agent to set itself to Dead).
   226    * detecting deployed Dead subordinates, recalling them, and removing them.
   227    * detecting its application's Dying state, and setting its own Dying state.
   228    * if a subordinate, detecting that no relations with its principal are Alive,
   229      and setting its own Dying state.
   230    * detecting its own Dying state, and:
   231        * leaving all its relation scopes;
   232        * waiting for all its subordinates to be removed;
   233        * setting its own Dead state.
   234  
   235  A machine agent's responsibilities are determined by its jobs. There are only
   236  two jobs in existence at the moment; an MA whose machine has JobHostUnits is
   237  responsible for:
   238  
   239    * detecting undeployed Alive principals assigned to it and deploying them.
   240    * detecting undeployed non-Alive principals assigned to it and removing them
   241      (recall that unit removal may imply application removal).
   242    * detecting deployed Dead principals assigned to it, recalling them, and
   243      removing them.
   244    * detecting deployed principals not assigned to it, and recalling them.
   245    * detecting its machine's Dying state, and setting it to Dead.
   246  
   247  ...while one whose machine has JobManageModel is responsible for:
   248  
   249    * detecting Alive machines without instance IDs and provisioning provider
   250      instances to run their agents.
   251    * detecting non-Alive machines without instance IDs and removing them.
   252    * detecting Dead machines with instance IDs, decommissioning the instance, and
   253      removing the machine.
   254  
   255  Machines can in theory have multiple jobs, but in current practice do not.
   256  
   257  Implementation
   258  --------------
   259  
   260  All state change operations are mediated by the mgo/txn package, which provides
   261  multi-document transactions aginst MongoDB. This allows us to enforce the many
   262  conditions described above without experiencing races, so long as we are mindful
   263  when implementing them.
   264  
   265  Lifecycle support is not complete: relation lifecycles are, mostly, as are
   266  large parts of the unit and machine agent; but substantial parts of the
   267  machine, unit and application entity implementation still lack sophistication.
   268  This situation is being actively addressed.
   269  
   270  Beyond the plans detailed above, it is important to note that an agent that is
   271  failing to meet its responsibilities can have a somewhat distressing impact on
   272  the rest of the system. To counteract this, we have implemented a --force
   273  flag to remove-unit and remove-machine that forcibly sets an entity to
   274  Dead while maintaining consistency and sanity across all references.