github.com/mhilton/juju-juju@v0.0.0-20150901100907-a94dd2c73455/doc/provisioning.md (about)

     1  What We Run, and Why
     2  ====================
     3  
     4  Expressed as compactly as possible, the Provisioner is responsible for making
     5  sure that non-Dead machine entities in state have agents running on live
     6  instances; and for making sure that Dead machines, and stray instances, are
     7  removed and cleaned up.
     8  
     9  However, the choice of exactly what we deploy involves some subtleties. At the
    10  Provisioner level, it's simple: the series and the constraints we pass to the
    11  Environ.StartInstance come from the machine entity. But how did they get there?
    12  
    13  Series
    14  ------
    15  
    16  Individual charms are released for different possible target series; juju
    17  should guarantee that charms for series X are only ever run on series X.
    18  Every service, unit, and machine has a series that's set at creation time and
    19  subsequently immutable. Units take their series from their service, and can
    20  only be assigned to machines with matching series.
    21  
    22  Subordinate units cannot be assigned directly to machines; they are created
    23  by their principals, on the same machine, in response to the creation of
    24  subordinate relations. We therefore restrict subordinate relations such that
    25  they can only be created between services with matching series.
    26  
    27  Constraints
    28  -----------
    29  
    30  Constraints are stored for environments, services, units, and machines, but
    31  unit constraints are not currently exposed because they're not needed outside
    32  state, and are likely to just cause trouble and confusion if we expose them.
    33  
    34  From the point of a user, there are environment constraints and service
    35  constraints, and sensible manipulations of them lead to predictable unit
    36  deployment decisions. The mechanism is as follows:
    37  
    38    * when a unit is added, the current environment and service constraints
    39      are collapsed into a single value and stored for the unit. (To be clear:
    40      at the moment the unit is created, the current service and environment
    41      constraints will be combined such that every constraint not set on the
    42      service is taken from the environment (or left unset, if not specified
    43      at all).
    44    * when a machine is being added in order to host a given unit, it copies
    45      its constraints directly from the unit.
    46    * when a machine is being added without a unit associated -- for example,
    47      when adding additional state servers -- it copies its constraints directly
    48      from the environment.
    49  
    50  In this way the following sequence of operations becomes predictable:
    51  
    52  ```
    53      $ juju deploy --constraints mem=2G wordpress
    54      $ juju set-constraints --service wordpress mem=3G
    55      $ juju add-unit wordpress -n 2
    56  ```
    57  
    58  ...in that exactly one machine will be provisioned with the first set of
    59  constraints, and exactly two of them will be provisioned using the second
    60  set. This is much friendlier to the users than delaying the unit constraint
    61  capture and potentially suffering subtle and annoying races.
    62  
    63  Subordinate units cannot have constraints, because their deployment is
    64  controlled by their principal units. There's only ever one machine to which
    65  that subordinate could (and must) be deployed, and to restrict that further
    66  by means of constraints will only confuse people.
    67  
    68  Placement
    69  ---------
    70  
    71  Placement is the term given to allocating a unit to a specific machine.
    72  This is achieved with the `--to` option in the `deploy` and `add-unit`
    73  commands.
    74  
    75  In addition, it is possible to specify directives to `add-machine` to
    76  allocate machines to specific instances:
    77  
    78    - in a new container, possibly on an existing machine (e.g. `add-machine lxc:1`)
    79    - by using an existing host (i.e. `add-machine ssh:user@host`)
    80    - using provider-specific features (e.g. `add-machine zone=us-east-1a`)
    81  
    82  At the time of writing, the currently implemented provider-specific placement directives are:
    83  
    84    - Availability Zone: both the AWS and OpenStack providers support `zone=<zone>`, directing the provisioner to start an instance in the specified availability zone.
    85    - MAAS: `<hostname>` directs the MAAS provider to acquire the node with the specified hostname.
    86  
    87  Availability Zone Spread
    88  ------------------------
    89  
    90  For Juju providers that know about Availability Zones, instances will be automatically spread across the healthy availability zones to maximise service availability. This is achieved by having Juju:
    91  
    92    - be able to enumerate each of the availability zones and their current status,
    93    - calculate the "distribution group" for each instance at provisioning time.
    94  
    95  The distribution group of a nascent instance is the set of instances for which the availability zone spread will be computed. The new instance will be allocated to the zone with the fewest members of its group.
    96  
    97  Distribution groups are intentionally opaque to the providers. There are currently two types of groups: state servers and everything else. State servers are always allocated to the same distribution group; other instances are grouped according to the units assigned at provisioning time. A non-state server instance's group consists of all instances with units of the same services.
    98  
    99  At the time of writing, there are currently three providers providers supporting automatic availability zone spread: Microsoft Azure, AWS, and OpenStack. Azure's implementation is significantly different to the others as it contains various restrictions relating to the imposed conflation of high availability and load balancing.
   100  
   101  The AWS and OpenStack implementations are both based on the `provider/common.ZonedEnviron` interface; additional implementations should make use this if possible. There are two components:
   102  
   103    - unless a placement directive is specified, the provider's `StartInstance` must allocate an instance to one of the healthy availability zones. Some providers may restrict availability zones in ways that cannot be detected ahead of time, so it may be necessary to attempt each zone in turn (in order of least-to-most populous);
   104    - the provider must implement `state.InstanceDistributor` so that units are assigned to machines based on their availability zone allocations.
   105  
   106  Machine Status and Provisioning Errors (current)
   107  ------------------------------------------------
   108  
   109  In the light of time pressure, a unit assigned to a machine that has not been
   110  provisioned can be removed directly by calling `juju destroy-unit`. Any
   111  provisioning error can thus be "resolved" in an unsophisticated but moderately
   112  effective way:
   113  
   114  ```
   115      $ juju destroy-unit borken/0
   116  ```
   117  
   118  ...in that at least broken units don't clutter up the service and prevent its
   119  removal. However:
   120  
   121  ```
   122      $ juju destroy-machine 1
   123  ```
   124  
   125  ...does not yet cause an unprovisioned machine to be removed from state (whether
   126  directly, or indirectly via the provisioner; the best place to implement this
   127  functionality is not clear).
   128  
   129  Machine Status and Provisioning Errors (WIP)
   130  --------------------------------------------
   131  
   132  [TODO: figure this out; not yet implemented, somewhat speculative... in
   133  particular, use of "resolved" may be inappropriate. Consider adding a
   134  "retry" CLI tool...]
   135  
   136  When the provisioner fails to start a machine, it should ensure that (1) the
   137  machine has no instance id set and (2) the machine has an error status set
   138  that communicates the nature of the problem. This must be visible in the
   139  output of `juju status`; and we must supply suitable tools to the user so
   140  as to allow her to respond appropriately.
   141  
   142  If the user believes a machine's provisioning error to be transient, she can
   143  do a simple `juju resolved 14` which will set some state to make machine 14
   144  eligible for the provisioner's attention again.
   145  
   146  It may otherwise be that the unit ended up snapshotting a service/environ
   147  config pair that really isn't satsifiable. In that case, the user can try
   148  (say) `juju resolved 14 --constraints "mem=2G cpu-power=400"`, which allows
   149  her to completely replace the machine's constraints as well as marking the
   150  machine for reprovisioning attention.