github.com/mhilton/juju-juju@v0.0.0-20150901100907-a94dd2c73455/worker/dependency/doc.go

github.com/mhilton/juju-juju@v0.0.0-20150901100907-a94dd2c73455/worker/dependency/doc.go (about)

1 // Copyright 2015 Canonical Ltd.
2 // Licensed under the AGPLv3, see LICENCE file for details.
3
4 /*
5
6 The dependency package exists to address a general problem with shared resources
7 and the management of their lifetimes. Many kinds of software handle these issues
8 with more or less felicity, but it's particularly important the juju (a distributed
9 system that needs to be very fault-tolerant) handle them clearly and sanely.
10
11 Background
12 ----------
13
14 A cursory examination of the various workers run in juju agents (as of 2015-04-20)
15 reveals a distressing range of approaches to the shared resource problem. A
16 sampling of techniques (and their various problems) follows:
17
18 * enforce sharing in code structure, either directly via scoping or implicitly
19 via nested runners (state/api conns; agent config)
20 * code structure is inflexible, and it enforces strictly nested resource
21 lifetimes, which are not always adequate.
22 * just create N of them and hope it works out OK (environs)
23 * creating N prevents us from, e.g., using a single connection to an environ
24 and sanely rate-limiting ourselves.
25 * use filesystem locking across processes (machine execution lock)
26 * implementation sometimes flakes out, or is used improperly; and multiple
27 agents *are* a problem anyway, but even if we're all in-process we'll need
28 some shared machine lock...
29 * wrap workers to start up only when some condition is met (post-upgrade
30 stability -- itself also a shared resource)
31 * lifetime-nesting comments apply here again; *and* it makes it harder to
32 follow the code.
33 * implement a singleton (lease manager)
34 * singletons make it *even harder* to figure out what's going on -- they're
35 basically just fancy globals, and have all the associated problems with,
36 e.g. deadlocking due to unexpected shutdown order.
37
38 ...but, of course, they all have their various advantages:
39
40 * Of the approaches, the first is the most reliable by far. Despite the
41 inflexibility, there's a clear and comprehensible model in play that has yet
42 to cause serious confusion: each worker is created with its resource(s)
43 directly available in code scope, and trusts that it will be restarted by an
44 independent watchdog if one of its dependencies fails. This characteristic is
45 extremely beneficial and must be preserved; we just need it to be more
46 generally applicable.
47
48 * The create-N-Environs approach is valuable because it can be simply (if
49 inelegantly) integrated with its dependent worker, and a changed Environ
50 does not cause the whole dependent to fall over (unless the change is itself
51 bad). The former characteristic is a subtle trap (we shouldn't be baking
52 dependency-management complexity into the cores of our workers' select loops,
53 even if it is "simple" to do so), but the latter is important: in particular,
54 firewaller and provisioner are distressingly heavyweight workers and it would
55 be unwise to take an approach that led to them being restarted when not
56 necessary.
57
58 * The filesystem locking just should not happen -- and we need to integrate the
59 unit and machine agents to eliminate it (and for other reasons too) so we
60 should give some thought to the fact that we'll be shuffling these dependencies
61 around pretty hard in the future. If the approach can make that task easier,
62 then great.
63
64 * The singleton is dangerous specifically because its dependency interactions are
65 unclear. Absolute clarity of dependencies, as provided by the nesting approaches,
66 is in fact critical.
67
68 The various nesting approaches give easy access to directly-available resources,
69 which is great, but will fail as soon as you have a sufficiently sophisticated
70 dependent that can operate usefully without all its dependencies being satisfied
71 (we have a couple of requirements for this in the unit agent right now). Still,
72 direct resource access *is* tremendously convenient, and we need some way to
73 access one service from another.
74
75 However, all of these resources are very different: for a solution that encompasses
76 them all, you kinda have to represent them as interface{} at some point, and that's
77 very risky re: clarity.
78
79
80 Problem
81 -------
82
83 The package is intended to implement the following developer stories:
84
85 * As a developer, I want to provide a service provided by some worker to one or
86 more client workers.
87 * As a developer, I want to write a service that consumes one or more other
88 workers' services.
89 * As a developer, I want to choose how I respond to missing dependencies.
90 * As a developer, I want to be able to inject test doubles for my dependencies.
91 * As a developer, I want control over how my service is exposed to others.
92 * As a developer, I don't want to have to typecast my dependencies from
93 interface{} myself.
94 * As a developer, I want my service to be restarted if its dependencies change.
95
96 That last one might bear a little bit of explanation: but I contend that it's the
97 only reliable approach to writing resilient services that compose sanely into a
98 comprehensible system. Consider:
99
100 * Juju agents' lifetimes must be assumed to exceed the MTBR of the systems
101 they're deployed on; you might naively think that hard reboots are "rare"...
102 but they're not. They really are just a feature of the terrain we have to
103 traverse. Therefore every worker *always* has to be capable of picking itself
104 back up from scratch and continuing sanely. That is, we're not imposing a new
105 expectation: we're just working within the existing constraints.
106 * While some workers are simple, some are decidedly not; when a worker has any
107 more complexity than "none" it is a Bad Idea to mix dependency-management
108 concerns into their core logic: it creates the sort of morass in which subtle
109 bugs thrive.
110
111 So, we take advantage of the expected bounce-resilience, and excise all dependency
112 management concerns from the existing ones... in favour of a system that bounces
113 workers slightly more often than before, and thus exercises those code paths more;
114 so, when there are bugs, we're more likely to shake them out in automated testing
115 before they hit users.
116
117 We'd also like to implement these stories, which go together, and should be
118 added when their absence becomes inconvenient:
119
120 * As a developer, I want to be prevented from introducing dependency cycles
121 into my application. [NOT DONE]
122 * As a developer trying to understand the codebase, I want to know what workers
123 are running in an agent at any given time. [NOT DONE]
124 * As a developer, I want to add and remove groups of workers atomically, e.g.
125 when starting the set of state-server workers for a hosted environ; or when
126 starting the set of workers used by a single unit. [NOT DONE]
127
128
129 Solution
130 --------
131
132 Run a single dependency.Engine at the top level of each agent; express every
133 shared resource, and every worker that uses one, as a dependency.Manifold; and
134 install them all into the top-level engine.
135
136 When installed under some name, a dependency.Manifold represents the features of
137 a node in the engine's dependency graph. It lists:
138
139 * The names of its dependencies (Inputs).
140 * How to create the worker representing the resource (Start).
141 * How (if at all) to expose the resource as a service to other resources that
142 know it by name (Output).
143
144 ...and allows the developers of each independent service a common mechanism for
145 declaring and accessing their dependencies, and the ability to assume that they
146 will be restarted whenever there is a material change to their accessible
147 dependencies.
148
149
150 Usage
151 -----
152
153 In each worker package, write a `manifold.go` containing the following:
154
155 type ManifoldConfig struct {
156 // The names of the various dependencies, e.g.
157 APICallerName string
158 MachineLockName string
159 }
160
161 func Manifold(config ManifoldConfig) dependency.Manifold {
162 // Your code here...
163 }
164
165 ...and take care to construct your manifolds *only* via that function; *all*
166 your dependencies *must* be declared in your ManifoldConfig, and *must* be
167 accessed via those names. Don't hardcode anything, please.
168
169 If you find yourself using the same manifold configuration in several places,
170 consider adding helpers to worker/util, which includes mechanisms for simple
171 definition of manifolds that depend on an API caller; on an agent; or on both.
172
173
174 Concerns and mitigations thereof
175 --------------------------------
176
177 The dependency package will *not* provide the following features:
178
179 * Deterministic worker startup. As above, this is a blessing in disguise: if
180 your workers have a problem with this, they're using magical undeclared
181 dependencies and we get to see the inevitable bugs sooner.
182 TODO(fwereade): we should add fuzz to the bounce and restart durations to
183 more vigorously shake out the bugs...
184 * Hand-holding for developers writing Output funcs; the onus is on you to
185 document what you expose; produce useful error messages when they supplied
186 with unexpected types via the interface{} param; and NOT to panic. The onus
187 on your clients is only to read your docs and handle the errors you might
188 emit.
189
190 */
191 package dependency