github.com/mhilton/juju-juju@v0.0.0-20150901100907-a94dd2c73455/doc/lifecycles.txt

github.com/mhilton/juju-juju@v0.0.0-20150901100907-a94dd2c73455/doc/lifecycles.txt (about)

1 Lifecycles
2 ==========
3
4 In juju, certain fundamental state entities have "lifecycles". These entities
5 are:
6
7 * Machines
8 * Units
9 * Services
10 * Relations
11
12 ...and there are only 3 possible states for the above things:
13
14 * Alive (An entity is Alive when it is first created.)
15 * Dying (An entity becomes Dying when the user indicates that it should be
16 destroyed, and remains so while there are impediments to its removal.)
17 * Dead (an entity becomes Dead when there are no further impediments to
18 its removal; at this point it may be removed from the database at any time.
19 Some entities may become Dead and are removed as a single operation, and
20 are hence never directly observed to be "Dead", but should still be so
21 considered.)
22
23 There are two fundamental truths in this system:
24
25 * All such things start existence Alive.
26 * No such thing can ever change to an earlier state.
27
28 Beyond the above rules, lifecycle shifts occur at different times for different
29 kinds of entities.
30
31 Machines
32 --------
33
34 * Like everything else, a machine starts out Alive. `juju bootstrap` aside,
35 the user interface does not allow for direct creation of machines, but
36 `juju deploy` and `juju add-unit` may create machines as a consequence of
37 unit creation.
38 * If a machine has the JobManageEnviron job, it cannot become Dying or Dead.
39 Other jobs do not affect the lifecycle directly.
40 * If a machine has the JobHostUnits job, principal units can be assigned to it
41 while it is Alive.
42 * While principal units are assigned to a machine, its lifecycle cannot change
43 and `juju destroy-machine` will fail.
44 * When no principal units are assigned, `juju destroy-machine` will set the
45 machine to Dying. (Future plans: allow a machine to become Dying when it
46 has principal units, so long as they are not Alive. For now it's extra
47 complexity with little direct benefit.)
48 * Once a machine has been set to Dying, the corresponding Machine Agent (MA)
49 is responsible for setting it to Dead. (Future plans: when Dying units are
50 assigned, wait for them to become Dead and remove them completely before
51 making the machine Dead; not an issue now because the machine can't yet
52 become Dying with units assigned.)
53 * Once a machine has been set to Dead, the agent for some other machine (with
54 JobManageEnviron) will release the underlying instance back to the provider
55 and remove the machine entity from state. (Future uncertainty: should the
56 provisioner provision an instance for a Dying machine? At the moment, no,
57 because a Dying machine can't have any units in the first place; in the
58 future, er, maybe, because those Dying units may be attached to persistent
59 storage and should thus be allowed to continue to shut down cleanly as they
60 would usually do. Maybe.)
61
62 Units
63 -----
64
65 * A principal unit can be created directly with `juju deploy` or
66 `juju add-unit`.
67 * While a principal unit is Alive, it can be assigned to a machine.
68 * While a principal unit is Alive, it can enter the scopes of Alive
69 relations, which may cause the creation of subordinate units; so,
70 indirectly, `juju add-relation` can also cause the creation of units.
71 * A unit can become Dying at any time, but may not become Dead while any unit
72 subordinate to it exists, or while the unit is in scope for any relation.
73 * A principal unit can become Dying in one of two ways:
74 * `juju destroy-unit` (This doesn't work on subordinates; see below.)
75 * `juju destroy-service` (This does work on subordinates, but happens
76 indirectly in either case: the Unit Agents (UAs) for each unit of a
77 service set their corresponding units to Dying when they detect their
78 service Dying; this is because we try to assume 100k-scale and we can't
79 use mgo/txn to do a bulk update of 100k units: that makes for a txn
80 with at least 100k operations, and that's just crazy.)
81 * A subordinate must also become Dying when either:
82 * its principal becomes Dying, via `juju destroy-unit`; or
83 * the last Alive relation between its service and its principal's service
84 is no longer Alive. This may come about via `juju destroy-relation`.
85 * When any unit is Dying, its UA is responsible for removing impediments to
86 the unit becoming Dead, and then making it so. To do so, the UA must:
87 * Depart from all its relations in an orderly fashion.
88 * Wait for all its subordinates to become Dead, and remove them from state.
89 * Set its unit to Dead.
90 * As just noted, when a subordinate unit is Dead, it is removed from state by
91 its principal's UA; the relationship is the same as that of a principal unit
92 to its assigned machine agent, and of a machine to the JobManageEnviron
93 machine agent.
94
95 Services
96 --------
97
98 * Services are created with `juju deploy`. Services with duplicate names
99 are not allowed (units and machine with duplicate names are not possible:
100 their identifiers are assigned by juju).
101 * Unlike units and machines, services have no corresponding agent.
102 * In addition, services become Dead and are removed from the database in a
103 single atomic operation.
104 * When a service is Alive, units may be added to it, and relations can be
105 added using the service's endpoints.
106 * A service can be destroyed at any time, via `juju destroy-service`. This
107 causes all the units to become Dying, as discussed above, and will also
108 cause all relations in which the service is participating to become Dying
109 or be removed.
110 * If a destroyed service has no units, and all its relations are eligible
111 for immediate removal, then the service will also be removed immediately
112 rather than being set to Dying.
113 * If no associated relations exist, the service is removed by the MA which
114 removes the last unit of that service from state.
115 * If no units of the service remain, but its relations still exist, the
116 responsibility for removing the service falls to the last UA to leave scope
117 for that relation. (Yes, this is a UA for a unit of a totally different
118 service.)
119
120 Relations
121 ---------
122
123 * A relation is created with `juju add-relation`. No two relations with the
124 same canonical name can exist. (The canonical relation name form is
125 "<requirer-endpoint> <provider-endpoint>", where each endpoint takes the
126 form "<service-name>:<charm-relation-name>".)
127 * Thanks to convention, the above is not strictly true: it is possible
128 for a subordinate charm to require a container-scoped "juju-info"
129 relation. These restrictions mean that the name can never cause
130 actual ambiguity; nonetheless, support should be phased out smoothly
131 (see lp:1100076).
132 * A relation, like a service, has no corresponding agent; and becomes Dead
133 and is removed from the database in a single operation.
134 * Similarly to a service, a relation cannot be created while an identical
135 relation exists in state (in which identity is determined by equality of
136 canonical relation name -- a sequence of endpoint pairs sorted by role).
137 * While a relation is Alive, units of services in that relation can enter its
138 scope; that is, the UAs for those units can signal to the system that they
139 are participating in the relation.
140 * A relation can be destroyed with either `juju destroy-relation` or
141 `juju destroy-service`.
142 * When a relation is destroyed with no units in scope, it will immediately
143 become Dead and be removed from state, rather than being set to Dying.
144 * When a relation becomes Dying, the UAs of units that have entered its scope
145 are responsible for cleanly departing the relation by running hooks and then
146 leaving relation scope (signalling that they are no longer participating).
147 * When the last unit leaves the scope of a Dying relation, it must remove the
148 relation from state.
149 * As noted above, the Dying relation may be the only thing keeping a Dying
150 service (different to that of the acting UA) from removal; so, relation
151 removal may also imply service removal.
152
153 References
154 ----------
155
156 OK, that was a bit of a hail of bullets, and the motivations for the above are
157 perhaps not always clear. To consider it from another angle:
158
159 * Subordinate units reference principal units.
160 * Principal units reference machines.
161 * All units reference their services.
162 * All units reference the relations whose scopes they have joined.
163 * All relations reference the services they are part of.
164
165 In every case above, where X references Y, the life state of an X may be
166 sufficient to prevent a change in the life state of a Y; and, conversely, a
167 life change in an X may be sufficient to cause a life change in a Y. (In only
168 one case does the reverse hold -- that is, setting a service or relation to
169 Dying will cause appropriate units' agents to individually set their units to
170 Dying -- and this is just an implementation detail.)
171
172 The following scrawl may help you to visualize the references in play:
173
174 +-----------+ +---------+
175 +-->| principal |------>| machine |
176 | +-----------+ +---------+
177 | | |
178 | | +--------------+
179 | | |
180 | V V
181 | +----------+ +---------+
182 | | relation |------>| service |
183 | +----------+ +---------+
184 | A A
185 | | |
186 | | +--------------+
187 | | |
188 | +-------------+
189 +---| subordinate |
190 +-------------+
191
192 ...but is important to remember that it's only one view of the relationships
193 involved, and that the user-centric view is quite different; from a user's
194 perspective the influences appear to travel in the opposite direction:
195
196 * (destroying a machine "would" destroy its principals but that's disallowed)
197 * destroying a principal destroys all its subordinates
198 * (destroying a subordinate directly is impossible)
199 * destroying a service destroys all its units and relations
200 * destroying a container relation destroys all subordinates in the relation
201 * (destroying a global relation destroys nothing else)
202
203 ...and it takes a combination of these viewpoints to understand the detailed
204 interactions laid out above.
205
206 Agents
207 ------
208
209 It may also be instructive to consider the responsibilities of the unit and
210 machine agents. The unit agent is responsible for:
211
212 * detecting Alive relations incorporating its service and entering their
213 scopes (if a principal, this may involve creating subordinates).
214 * detecting Dying relations whose scope it has entered and leaving their
215 scope (this involves removing any relations or services that thereby
216 become unreferenced).
217 * detecting undeployed Alive subordinates and deploying them.
218 * detecting undeployed non-Alive subordinates and removing them (this raises
219 similar questions to those alluded to above re Dying units on Dying machines:
220 but, without persistent storage, there's no point deploying a Dying unit just
221 to wait for its agent to set itself to Dead).
222 * detecting deployed Dead subordinates, recalling them, and removing them.
223 * detecting its service's Dying state, and setting its own Dying state.
224 * if a subordinate, detecting that no relations with its principal are Alive,
225 and setting its own Dying state.
226 * detecting its own Dying state, and:
227 * leaving all its relation scopes;
228 * waiting for all its subordinates to be removed;
229 * setting its own Dead state.
230
231 A machine agent's responsibilities are determined by its jobs. There are only
232 two jobs in existence at the moment; an MA whose machine has JobHostUnits is
233 responsible for:
234
235 * detecting undeployed Alive principals assigned to it and deploying them.
236 * detecting undeployed non-Alive principals assigned to it and removing them
237 (recall that unit removal may imply service removal).
238 * detecting deployed Dead principals assigned to it, recalling them, and
239 removing them.
240 * detecting deployed principals not assigned to it, and recalling them.
241 * detecting its machine's Dying state, and setting it to Dead.
242
243 ...while one whose machine has JobManageEnviron is responsible for:
244
245 * detecting Alive machines without instance IDs and provisioning provider
246 instances to run their agents.
247 * detecting non-Alive machines without instance IDs and removing them.
248 * detecting Dead machines with instance IDs, decommissioning the instance, and
249 removing the machine.
250
251 Machines can in theory have multiple jobs, but in current practice do not.
252
253 Implementation
254 --------------
255
256 All state change operations are mediated by the mgo/txn package, which provides
257 multi-document transactions aginst MongoDB. This allows us to enforce the many
258 conditions described above without experiencing races, so long as we are mindful
259 when implementing them.
260
261 Lifecycle support is not complete: relation lifecycles are, mostly, as are
262 large parts of the unit and machine agent; but substantial parts of the
263 machine, unit and service entity implementation still lack sophistication.
264 This situation is being actively addressed.
265
266 Beyond the plans detailed above, it is important to note that an agent that is
267 failing to meet its responsibilities can have a somewhat distressing impact on
268 the rest of the system. To counteract this, we intend to implement a --force
269 flag to destroy-unit (and destroy-machine?) that forcibly sets an entity to
270 Dead while maintaining consistency and sanity across all references. The best
271 approach to this problem has yet to be agreed; we're not short of options, but
272 none are exceptionally compelling.