github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/contributions/juju-14-10-plans.md

github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/contributions/juju-14-10-plans.md (about)

1 title: Juju 14.10 Plans
2
3 [TOC]
4
5 # Core
6
7 ## Multi-environment State Server
8
9 Multi-environment, multi-customer.
10
11 ### Use Cases
12
13 - Embedding in Azure - people can spin up an environment without paying for a state-server, quite a cost for people to start.
14 - Embedding in Horizon (OpenStack dashboard).
15
16 ### How do we start?
17
18 - Create multiple client users (some sort of api) aka User Management.
19 - create-environment (need environments), list environments.
20 - SelectEnvironment is called after Login, Login itself exposes the multi-environment API root. To avoid an extra round trip, Login can optionally pass in the EnvironmentUUID.
21 - Credentials need to move out of environment.
22 - Machine/user/etc (everything) documents gain an environment id (except users).
23 - API filters by environment id inherited from the SelectEnvironment.
24 - rsyslog needs to split based on tenant.
25 - provisioner/firewaller/other workers gets environment id, one task per environment.
26 - Consider doing with the accounts/environment separation from `environments.yaml` (Juju 2.0 conf).
27 - This is changing the DB representation so that we represent Environments referencing Accounts and pointing to Providers.
28 - It may be possible the EnvironConfig still collapses this into one-big-bag of config, but it should be possible to easily change your Provider Credentials for a given Account have that cascade to all of your environments.
29
30 ### Work Items
31
32 - State object gains an EnvironmentUUID attribute, all methods against that State object implicitly use that environment.
33 - Update state document objects (machine,unit,relation-scopes,etc) to include EnvironmentUUID.
34 - MultiState object
35 - Includes the Users and Environment collection.
36 - Used for initial Login to the API, and subsequent listing/selecting of environments.
37 - SelectEnvironment returns an API root like we have today, backed by a State object (like today) that includes the environment UUID.
38 - **Unclear**: how to preserve compatibility with clients that don’t pass the environment UUID.
39 - Desirable: being able to avoid the extra round trip for Login+SelectEnvironment for most commands that know ahead of time (`status`, `add-unit`,etc).
40 - Admin on State server gives you the global rights across all Environments.
41 - Environments collection.
42 - MultiState APIs
43 - `ListEnvironments`
44 - Needs to filter based on the roles available to the user in various environments. Should not return environments that you don’t have access to.
45 - `SelectEnvironment`
46 - `CreateEnvironment`
47 - `DestroyEnvironment`
48 - Logging
49 - TBD, regardless the mechanism, we need environment UUID recorded per log message, so we can filter it out again.
50 - In rsyslog it could be put into the prefix, or sharded into separate log files.
51 - Include the GUI for the environment on (in) the state server per environment.
52
53 ## HA
54
55 - Current Issues
56 - `debug-log` retrieves the log from one API server; so in case of an HA environment not all logs are retrieved.
57 - https://bugs.launchpad.net/juju-core/+bug/1310268
58 - What is missing?
59 - HA on local.
60 - Next steps
61 - Decrease count (3 -> 1, 5 -> 3).
62 - Scaling API separately from mongo.
63
64 ### Notes
65
66 Working on rsyslog. Working logging to multiple rsyslogd. This is ready to be reviewed.
67
68 Need to update the conf when machines added or removed. This needs to be done.
69
70 Possible problem: logs being very out of order (hours off).
71
72 **Bug**: Peergrouper log spam on local.
73
74 HA on local can’t work 100% because VMs can’t start new VMs, so only machine 0 can be a useful master state server. However, there are other tests that can be done with HA that would be useful on local HA.
75
76 It would be useful to be able to have the master state server be beefy and higher priority for master, and the non-masters be non-beefy, because the master has a ton more load than the non-masters. Right now, ensure availability is very broad and vague. It’s not tweakable. However, you can do it by bootstrapping with a big machine, change the constraints to smaller machines, then ensure availability. The only thing we would need to add is a way to give a state server a higher priority for becoming master.
77
78 Need better introspection to the status so that the GUI can better reflect what’s going on. Need GUI to be able to call ensure availability.
79 GUI needs to show state servers.
80
81 Restore process for HA is just restore one machine then call ensure availability.
82
83 ### GUI needs
84
85 - allwatcher needs to add fields for HA status changes.
86 - GUI needs to know what API Address to talk to, handle fallback when one goes away, and keep up to date to know who else to talk to.
87 - ensure-availability needs to return more status (actions triggered).
88 - How is HA enabled/displayed in GUI? what does machine view show?
89 - Can you deploy multiple Juju-GUI charms for HA of the GUI itself?
90
91 ### CI
92
93 1. shutdown the master node or temporarily cripple the network to verify HA resolves the returned master.
94 2. Test on local because local will be used in demonstrations.
95 3. If backup-restore is also being done, then a restore of the master is a new master; ensure-availability must be rerun.
96
97 ### Work Items
98
99 - **Bug**: Agent conf needs to store all addresses (hostports), not just private addresses. Needed for manual provider.
100 - **Bug**: Peergrouper log spam on local.
101 - Change mongo to write majority, this is a change per session.
102 - Change mongo to write WAL logs synchronously.
103 - Need docs about how to use ensure availability on how to remove a machine that died. (try to improve the actual user story for how this works).
104 - `juju bootstrap` && `juju ensure-availability` (should not try to create a replacement for machine-0).
105 - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down.
106 - Machine that was down, ensure-availability was run to replace it, when the machine comes back it should not have a vote and should not try to be another API server.
107 - “juju upgrade-juju” should coordinate between the API servers to enable DB schema updates (before rewriting the schema make sure all API servers are upgraded and then only the master API server performs the schema change).
108 - APIWorker on nodes with JujuManageEnvironment should only connect to the API Server on localhost.
109 - Determine how Backup works when in HA.
110 - Changes for GUI to expose HA status.
111 - Changes for GUI to monitor what the current API servers are (need the Watcher that other Agents use exposed on the Client Facade).
112 - `ensure-availability` needs to return more status (actions triggered) (EnsureAvailability API call should return the actions).
113 - Change mongo to write majority, this is a change per session.
114 - Change mongo to write WAL logs synchronously.
115 - Need docs about how to use ensure availability on how to remove a machine that died.
116 - `juju bootstrap && juju ensure-availability`
117 - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down.
118
119 ### Work items (stretch goals)
120
121 - Ability to reduce number of state servers.
122 - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers).
123 - Ability to set priority on a state server.
124 - Ability to reduce number of state servers.
125 - Autorecovery - bringing back machines that die (or just calling ensure availability again).
126 - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers).
127
128 ## State, status, charm reporting
129
130 Statuses like ‘started’ don’t have enough detail. We don’t know the true state of the system or a charm from status like started.
131
132 - s/ready/healthy and s/unready/unhealthy
133 - Add jujuc tools ready and unready (healthy, unhealthy).
134 - Ready takes no positional arguments.
135 - Unready takes a single positional argument that is a message that explains why.
136 - Charm authors choose the message they want to use.
137 - Both ready/unready, when called without other flags, apply to the unit that is running.
138 - The above also accept a relation flag, `-r <relation id>`, which applies the status to the specified relation.
139 - The status data for a unit keeps track of the ready status, expose in status.
140 - Implementation needs to be shared with allwatcher so gui gets to see the info.
141 - Implement a ready-check hook that will be called periodically if exists; units expected to update ready status to be reported when hook is called.
142 - The details states are sub-statuses of ‘started’.
143 - Possible granular statuses for units.
144 - provisioned
145 - installing (sub or pending)
146 - Juju will poll the ready-check hook for current state. Charms need to respond ready or unready.
147 - We might want a concise and summary of the status. GUI might want to show the concise and later show the summary.
148 - Status is already bloated.
149 - Can status be intelligent enough to only include the data needed?
150 - Can you subscribe to get updates for just the information you think is changing...subscribe to the allwatcher?
151 - `juju status --all` would be the current behavior.
152 - We would `start --all` being implicit, but depreciated.
153 - We will switch to a more terse format.
154 - The status “started” is not really ready.
155 - There may be other hooks that still need to run.
156 - Only the charm knows when the service is ready.
157 - When install completes, the status is implicitly “started”.
158 - The charm author can set install to return a message to mean it is unready.
159 - Authors want to know when a charm is blocked because it is waiting on a hook.
160 - We can solve 80% of the problem with some effort but a proper solution is a lot of work.
161 - It isn’t clear when one unit is still being debugged.
162
163 ### Work Items
164
165 1. Introduce granular statuses.
166 1. Implement filters/subscribers to retrieve granular status.
167 1. Unify status and all-watcher.
168 1. Switch status from --all to the concise form
169 - (?) know when the charm is stable, when there are no hooks queued
170 - (?) know when all services are stable
171 1. When deploying then adding debug-hooks, the later could set up a pinger for the service being deployed, which puts the service into debug as it comes up.
172 1. `juju retry` to restart the hooks because resolved is abused.
173
174 ## Error Handling
175
176 - JDFI. We have a package. Use it.
177 - We need to annotate with a line number and a stacktrace.
178 - We have type preservation.
179 - There are some agreement to change the names of some of the API.
180 - Add this as we needed. Switching all code to use it is stalling the production line.
181 - Reviewer will push back to use the new error handling.
182
183 ### Work Items
184
185 1. Extend juju errors package to annotate with file and line number.
186 1. Log the annotated stack trace.
187 1. Change the backend to use `errgo`.
188 1. We need a template (Dimiter’s example) of how to use error logging.
189
190 ## Image Based Workflows
191
192 Charms able to specify an image (maybe docker) with the addition of storage, storage dirs are passed into docker as it is launched.
193
194 Unit agent may run either inside or outside the docker container (not yet determined).
195
196 Machine agent would mount the storage, and the charm directory into the docker container when it starts. The hooks are executed the docker container.
197
198 Looking to make the docker support a first class citizen in Juju.
199
200 *“Juju incorporates docker for image based workflows”*
201
202 Maybe limited to ones based on ubuntu-cloud image (full OS container).
203
204 May well have a registry per CPC to make downloading images faster on that cloud.
205
206 Perhaps have docker file (instructions to build the image) into the charm. The registry that we look up needs to be configurable.
207
208 Offline install will require pulling images into a local registry.
209
210 ### Work Items
211
212 1. Unit agent inside container.
213 1. Image registry.
214 1. Charm metadata to describe the image and registry.
215 1. Deployer to understand docker, deployer inspects charm metadata to determine deployment method, traditional vs. docker.
216 1. A docker deployer needs to be written that can download the image from a registry, and start the container mounting the agent config, storage, charm dir, upstart script for unit agent (if unit agent inside).
217 1. Need docker work to execute hooks inside the container from the outside.
218
219 **Depends on storage 0.1 done first.**
220
221 ## Scalability
222
223 ### Items that need discussion in Vegas
224
225 - How do we scale to an environment with 15k active units?
226 - How do admin operations scale?
227 - How do we handle failing units?
228 - dump and re-create
229 - Interaction with storage definition.
230 - How do we make a “juju status” that can provide a summary without getting bogged down in repeated information
231 - How does relation/get/set change propagation scale?
232 - Where are the current bottlenecks when deploying say hadoop?
233 - Where are the current bottlenecks when deploying OpenStack?
234 - Pub/Sub
235 - What do we need to do here?
236 - Notes:
237 - We need a pub/sub for the watchers to help scale.
238 - Each watcher pub/subs on its own, move up one level?
239 - Need for respond to events that occur, in a non-coupled way (indirect sub to goroutine).
240 - Logging particular events?
241 - Only one thing looking at the transaction log, whoops, not as bad as we thought.
242 - 100k units, leads to millions of go-routines, blocking is an issue.
243 - If we do a pub/sub system, let’s use it everywhere? Replace watchers?
244 - Related to idea of pub/sub on output variables and the like it sounds like.
245 - Watching subfield granularity of a document perhaps?
246 - 0mq has this, should reuse that and not invent our own pub/sub.
247 - 0mq has Go bindings, wonder if it works in gccgo.
248 - Does this replace the api? No, can’t Javascript to 0mq directly so need some api-ness for clients.
249 - Are there alternatives to the watcher design?
250 - Really good for testing. Can decouple parts and make it easy/fast to test if the event is fired.
251 - Shared watcher for all things (On the service object?)
252 - Have a big copy of the world in memory, helps with a lot of this.
253 - Charm output variables watching, charm outputs, hits state, megawatcher catches and updates and tells everyone it’s changed.
254 - Helps with ABA problem using in memory model.
255 - Use 3rd party pub-sub rather than writing our own.
256
257 ### Work Items
258
259 1. Boot generic machines which then ask juju for identity info.
260 1. Bulk machine provisioning.
261 1. Fix uniter event storm due to “number units” changed events.
262 1. Implement proper pub sub system to replace watchers.
263 1. State server machine agent (APIServer) should not listen to outside connection requests until it itself (APIWorker) has started.
264
265 ## Determinism
266
267 ### First Issue: Install repeatability
268
269 There are two approaches to giving us better isolation from network and other externalities at deploy time.
270
271 1. Fix charms so they don’t have to rely on external resources.
272 - Perhaps by improvements around fat.
273 - REQUIRED: Remove internal Git usage (DONE).
274 - Perhaps by making it easy to manage those resources in Juju itself?
275 - Either create a TOSCA like “resources” catalog per environment: upload or fetch resources to the environment at deploy time (or as a pre-deploy step)
276 - or create a single central resource catalog with forwarding aka “gem store for the world”.
277 1. Snapshot based workflows for scale/up/down so external resources aren't hit on every new deploy.
278 - We could add the hooks necessary to core, but the actual orchestration of images, seems a bit more tricky and could depend on a better storage story.
279
280 ### Second Issue
281
282 From Kapil: “Runtime modification of any change results in non deterministic propagation across the topology that can lead to service interruption. Needs change barriers around many things but thats not implemented or available. e.g. config changed and upgrade for example executed at the same time by all units.”.
283
284 ### Upgrade Juju
285
286 `juju upgrade-juju` -> goes to magic revision (simple bug fix) that an operator can’t determine.
287
288 Juju internally lacks composable transactions, many actions violate semantic transaction boundaries and thus partial failure states leave inconsistencies.
289
290 Kapil notes:
291
292 > One of the issues with complex application topologies is how runtime changes ripple through the system. e.g. a config change on service propagates via relations to service b and then service c. It's eventually consistent and convergent, but during the convergence what's the status of the services within the topology. Is it operational? Is temporarily broken?
293
294 > **This is a hard problem** to solve and its one I've encountered in both our OpenStack and Cloud Foundry charms.
295
296 > In discussions with Ben during the Cloud Foundry sprint, the only mitigation we could think of on Juju's part was some form of barrier coordination around changes. e.g. that the ripple proceeds evenly through the system. It's not a panacea but it can help. Especially so looking at simpler cases of just doing barriers around `config-change` and `charm-upgrade`. What makes this a bit worse for Juju then other systems, is that we're purposefully encapsulating behavior in arbitrary languages and promoting blind/trust based reuse, so a charm user doesn't really know what effect setting any config value will have. e.g. the cases I encountered before were setting a 'shared-secret' value and and an 'ssl' enumeration value on respective service config... for the ssl i was able to audit that it was okay at runtime.. but thats a really subtle thing to test or detect or maintain.
297
298 > Any change can ripple through the topology. We have an eventually-consistent system, but while it is rippling, we have no idea. Lack of determinism means someone who uses Juju cannot make uptime guarantees
299
300 **Bug**: downgrading charms is not well supported.
301
302 ### Questions
303
304 - Do we need barriers? e.g. config-changed affects all units of a service simultaneously.
305 - Do we need pools of units within a service?
306
307 ### Work Items
308
309 - Unit-Ids must be unique (even after you've destroyed and re-created a service).
310 - Address changes must propagate to relations.
311 - `--dry-run` for `juju upgrade-juju`.
312 - `--dry-run` for deploy (what charm version and series am I going to get?).
313
314 ## Health Checks
315
316 Juju “status” reporting in charm needs to be clearly defined and expressive enough to cover a few critical use cases. It is important to note that BOSH has such a system.
317
318 - Canaries and rolling unit upgrades (health check as pre-requisite).
319 - Is a service actually running?
320 - Coordination of database schema upgrades with webserver unit upgrades (as an example of the general problem of coordinated upgrades).
321 - Determining when HA Quorum has been reached or a server has been degraded.
322
323 ### Questions
324
325 - We discussed Error, and Ready as states, but do we need a third? Pending, Error, and Ready?
326 - Do we need any more than three states?
327 - Suggestion: Three states, plus an error description JSON map.
328
329 ## Storage management
330
331 ### Allow charms to declare storage needs (block and mount).
332
333 - [Discussion from Capetown](https://docs.google.com/a/canonical.com/document/d/1akh53dDTROnd0wTjGjOrsEp-7CGorxVp2ErzMC_G-zg/edit)
334 - [Proposal post Capetown (MS) (lacks storage-sets)](https://docs.google.com/a/canonical.com/document/d/1OhaLiHMoGNFEmDJTiNGMluIlkFtzcYjezC8Yq4nAX3Q/edit#heading=h.wjxtdqqbl1fg)
335
336 Entity to be managed:
337
338 - CRUD, snapshot
339
340 Charms declare it:
341
342 - Path, type (ephemeral/persistent) block.
343
344 Storage 0.1:
345
346 - Storage set in state - track information in some way.
347 - Disks ( placement, storage).
348 - Provider APIs (to create, delete, attach storage, … expand for later).
349 - Provider to be able to attach storage to a machine.
350 - Charms need to be able to say in metadata.
351 - `jujud` commands to have charms be able to resolve where the storage is on the machine.
352 - Degradation, manual provider or other provider that doesn’t provide storage (DO), do not fail to deploy, but we need to communication warning of some form, CLI should fail? API will not?
353
354 Storage set, need to talk services, needs to be exposed as management processes.
355
356 Multitenant storage? Probably not for initial implementation, but ***do not design it out***.
357
358 Need to consider being able to map our existing storage policy onto the new design (e.g. AWS EBS volume for how Juju works with Amazon)
359
360 NOTE: Storage is tied to a zone, ops can take a long time to run.
361
362 Consider upgrades of charms, and how we can move from the existing state where a charm may have their own storage that they have handled, to the new world where we model the storage in state.
363
364 - (2) Add state storage document to charm document.
365 - Upgrading juju should detect services that have charms with storage requirements and fulfill them for new units.
366 - (6) Add state for storage entities attached to units.
367 - Lifecycle management for storage entities.
368 - (6) When deploying units, need to find out storage is needed.
369 - Make provisioner aware of workloads and include storage details when needed.
370 - Change unit assignment to machine based on storage restrictions.
371 - (4) Define provider apis for handling storage.
372 - Create new volume.
373 - Delete volume.
374 - Attach volume to instance.
375 - (12) Implement provider APIs for storage on different providers.
376 - OpenStack
377 - EC2
378 - MaaS
379 - Azure?
380 - (0) Consider storage provider APIs for compute providers that have storage as a service.
381 - (2) Define new `metadata.yaml` fields for dealing with storage.
382 - (0) Consider mapping between charm requirements and service-level restrictions on what storage should actually be provided.
383 - (4) Add storage to status.
384 - Top level storage entity keys.
385 - Units showing storage entities associated.
386 - Services show storage details.
387 - (4) CLI/API operations on storage entities.
388 - Add storage.
389 - Remove storage.
390 - Further operations? Resize? Not now.
391
392 ## Juju as a good auto-scaling toolkit
393
394 *Not a goal: Doing autoscaling in core.*
395
396 Goal: providing the API’s and features needed to easily write auto-scaling systems for specific workloads.
397
398 Outside stakeholders: Cloud Installer team.
399
400 We need to be able to clean up after ourselves automatically.
401 Where “clean up” actions are required, they need to take bulk operation commands.
402
403 - Destroy-service/destroy-unit should cascade to destroy dirty machines
404
405 ## IAAS scalability
406
407 - Security Group re-think:
408 - The security group approach needs to switch to per-service groups
409 - We need to support individual on-machine/container firewall rules
410 - Support for instance type and AZ locality
411
412 ## Idempotency
413
414 Is this a juju issue, or a charm issue? Config management tools always promise this, but rarely deliver -- though many deliver **more** than juju. What are the specific issues in question with Cloud Foundry?
415
416 ## Charm "resources" (fat bundles, local storage, external resource caching)
417
418 ### Problem Statements
419
420 - Writing and maintaining “fat” charms is difficult
421 - Forking charm executables to support multiple upstream release artifacts is sub-optimal
422 - Fat charms are problematic
423 - Non-fat charms are dependent on quite a few external resources for deployment
424 - Non-fat charms are not *necessarily* deterministic as to which version of the software will be installed (even to the point of sometimes deploying different versions in the same service)
425
426 ### Proposed Agenda
427
428 - Discuss making “fat charms better”
429 - Switch to a “resources” model, where a charm can declare the ‘external’ content that it depends on, and the store manages caching and replication of it
430 - Consider building on the work IS has done
431 - Choose a path, and enumerate all the work that needs to be done to fully solve this problem
432
433 ### Proposal
434
435 - ~`resource-get NAME` within a charm to pull down a published blob~
436 - Instead using a model where charms request names, the charm overall declares the resources it uses, and the Uniter ensures that the data is available before firing the upgrade/install hooks.
437 - `resources.yaml` declares a list of streams that contain resources
438
439 ```
440 default-stream: stable
441 streams:
442 stable:
443 devel:
444 common:
445 common.zip
446 amd64:
447 foobar.zip
448 ```
449
450 - Rresources directory structure for charms should match those of the charm author so bind-mounting the directory for development still works in the deployed version of the directory structure, you will only have common and arch specific files. Should there be a symlink to specific arch? Either:
451 - publish charm errors if there are name collisions across common and arch specific directories. This way all the files are in a resources directory for the hook execution. This does mean that the charm developer needs a way to create symlinks in the top directory to the current set of resources they want to use (charm-tool resources-link amd64) - Windows? (they have symlinks right?)
452 - charm has resources/common, and resources/arch. “arch” is still a link, but just one.
453 - charm has resources/common, and resources/amd64
454 - this requires the hook knowing the arch
455 - charm identifiers become qualified with the stream name and resource version (precise/mysql-25.devel.325)
456 - juju status will show new version available if the entire version string (including resources) changes.
457 - if mysql-25.devel.325 is installed, and a different version of resources becomes current, this will be shown in “juju status”
458 - currently ask for mysql-latest, should perhaps be changed to mysql-current as we don’t necessarily want the latest version
459 - each named stream has an independent version, which is independent of both other streams and of the explicit charm version.
460 - upgrade-charm upgrades to the latest full version of the charm, including its resources
461 - upgrade-charm reports to the user what old version it was at and what the new version it upgraded to
462 - blobs are stored in the charm store, your environment always has a charm store, which can be synced for offline deployments
463 - today deploy ensures that the charm is available, copies it to environment storage, this will now need to do the same for the resources for the charm
464 - deploy should also confirm that the charm version and resource version are compatible
465 - `juju deploy mysql-25.dev.326` may fail because resources version 326 has a different manifest than declared in charm 25’s `resources.yaml`
466 - `juju deploy mysql`
467 - finds the current version of the charm and resources in the default stream
468 - the charm store has already validated that they match
469 - `juju deploy mysql-25`
470 - uses the default stream
471 - how do we determine the resources for this version
472 - Does current match? If yes, use it.
473 - If 25 < current, then look back from current resources and grab the first that has a matching manifest.
474 - Could just fail.
475 - Charm store could track the best current resources for any given charm version, as identified by moving the current resources pointer while keeping the charm pointer the same. For charm versions that are current, remember the current resources version.
476 - If we take this approach, there will be charm versions that have never been “current”, so deploying this without explicitly specifying the resources version will fail.
477 - `juju deploy mysql.nightly` (syntax TBD)
478 - `juju deploy mysql --option=stream=nightly` (hand wave - we don’t like this one as getting full version partly from config feels weird)
479 - Find the current version of mysql and the current version of the paid stream resources.
480 - So, the charm store needs to remember the current resources for each stream for each charm version for the current values.
481 - charm store has pointers for “current” version of charms and “current” version of resources
482 - charm store requires that the resources defined in the current pointers have the same shape (same list of files)
483 - `charm-publish` requires a local copy of all resources (for all architectures), and validates that `resources.yaml` matches the resources tree.
484 - `charm-publish` computes the local Hash of resources, and the Manifest for what is currently in the charm store to publish both the charm metadata and all resources in a single request
485 - publishing does not immediately move the ‘current’ pointer. This allows someone to explicitly deploy the version and test that the charm works with that version of resources
486 - supported architectures is an emergent property tracked by the charm store (known bad/unknown/known good) based on testing - hand wave
487 - charm store will be expected to de-dupe based on content hash (possibly a pair of different long hashes just to be sure)
488 - don’t let the manifest just be the SHA hash without a challenge
489 - either random set of bytes from the content
490 - salted hash - charm store gives the salt, publish charm computes the salted hash to confirm that they actually have the content
491
492 ### Spec
493
494 - be clear about what is in the charm store, what is defined by the charm in `resources.yaml`, and what deployed on disk
495 - use cases should show both the charm developer workflow, and user upgrade flow (which files get uploaded/downloaded etc)
496 - developing a new charm with resources
497 - with common resources
498 - with different resources for different architectures
499 - with some architectures needing specific files
500 - upgrade a charm by just modifying a few files
501 - upgrade a charm by only modifying charm hook
502 - upgrade both hook and resources
503 - adding new files
504 - docker workflow with base image and one overlay
505 - updating the overlay of a docker charm
506 - adding a new docker overlay will cause a rev bump on the charm as well as the resources because the resources.yaml file has to change to include the new layer
507 - illustrate explicitly the workflow if they forget to add the new layer to the resources.yaml file - publish fails because resources.yaml doesn’t match disk resources directory tree
508
509 ### Discussion
510
511 - Canarying will have to be across charm revision, blob set, and charm config.
512 - The charm version now includes charm revision and resources revision.
513 - Further discussion needed around health status for canaries later.
514 - Access control needs to be on top of the content addressing, just knowing the hash does not imply permission to fetch.
515 - Saving network transfer by doing binary diff on blobs of the same name with different hashes/versions would be nice for upgrades.
516 *sabdfl* says we have this behaviour already with the phone images, and we should break this out into some common library somewhere somehow.
517
518 ### Charms define their resources
519
520 - `resources.yaml` (next to `metadata.yaml`)
521 - Stream
522 - Has a description (free vs paid, beta etc/stable vs proposed)
523 - If you want to change the logic of a charm based on the blob stream, that is actually a different charm (or an if statement in your hooks)
524 - Streams will gain ACLs later (can you use the paid stream)
525 - Charms must declare a default stream
526 - Filenames
527 - name of the blob
528 - Version
529 - just a number (monotonically increasing number), version is stream dependent
530 - store has a pointer to the “current” version (which may not be the latest)
531 - Architecture
532 - Charm declares the shape of the resources it consumes (what files must be available). The store maintains the invariant that when the resource is updated, it contains the shape that the charm declared.
533 - `charm-publish` uploads both the version of the charm and the version of the resources
534 - we add a “current” pointer to charms like the resources, so that you have an opportunity to upload the charm and its resources and test it before it becomes the default charm that people get (instead of getting the ‘latest’ charm, you always get the ‘current’ unless explicitly specified)
535 - mysql-13.paid.62
536
537 ### Notes
538
539 We need to cache fat charms on the bootstrap node. We need to "auto Kapil" fat charms. Sometimes we don't even have access to the outside network. We need one hop from the unit to the bootstrap
540
541 However the important thing is that customers will probably fat charm everything, aka. huge IBM Websphere Java payload.
542
543 - Can Juju handle gigs of payload? Nate: Yes, moving away from the git storage.
544 - Is there anything core can do to make charms smaller?
545 - Marco: Yes
546 - Ben: We need a mechanism to specific common deps so that we can share them instead of having a copy in every charm. A bundle could have deps included, or maybe a common blob store?
547 - juju-deployer is moving to core.
548
549 If we move to image based workloads we can have a set image that included all the deps.
550
551 Nate: We could do it so if we’re on a certain cloud we can install the deps as part of the cloud: aka. if I am on softlayer make sure IBM java is installed via cloud-init. So we can do things like an optimized image for big data.
552
553 ### Work Items
554
555 1. Add optional format version to charm metadata (default 0) - 2
556 - Get juju to reject charms with formats it doesn’t know about ASAP
557 1. Charm store needs to grow blob storage, with multiple streams, current resource pointers and links to the charm itself for the resources - 4
558 1. Charm store needs to gain current charm revision pointers to charm - 2
559 - Juju should ask for current not latest
560 1. The charm store needs to know which revisions of each resource stream each charm revision works with - 2
561 1. Charm gains optional `resources.yaml` - 2
562 - Bump format version for those using `resources.yaml`
563 1. Need to write a proper charm publish - 12
564 - Resource manifest match
565 - Salted hashes
566 - Partial diff up/down not in rev 1
567 1. State server needs an HA charm/resources store - 8
568 - Should use same code as the actual charm store (shared lib/package)
569 - Replaces current charm storage in provider storage
570 1. Charm does not exist in state until we have copied all authorized resources into the local charm store. - 2
571 1. Uniter/charm.deployer needs to know about the resources file, parse content, know which stream, request resources from the local charm store, probably authenticated - 4
572 - Puts the resources into the resources directory as part of Deploy
573 1. Bind mounting ensuring the links for the files flatten in the resources dir - 2
574
575 ## Make writing providers easier
576
577 ### Problems
578
579 - Writing providers is hard.
580 - Writing providers takes a long time .
581 - Writing providers requires knowledge of internals of juju-core.
582 - Providers suffer bitrot quite quickly.
583
584 ### Agenda
585
586 - Can we externalize from core? (plugins/other languages?)
587 - Pre-made stub project with pluggable functions?
588 - How to keep in sync with core changes and avoid bitrot?
589 - How to insulate providers from changes to core?
590 - Can we simplify the interface?
591 - Complicating factor is config - can some be shared?
592 - Need to design for reuse - factor out common logic.
593
594 ### Notes
595
596 - Keep `EnvironProvider`
597 - Split `Environ` interface into smaller chunks
598 - E.g. `InstanceManagement`, `Firewall`
599 - Smaller structs with common logic, e.g. port management what use provider specific call outs
600 - Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct
601 - Above will allow necessary provider specific call outs to be identified
602
603 ### Work Items
604
605 1. Methods on provider operate on instance id
606 1. Introduce bulk API calls
607 1. Move instance addresses into environs/network
608 1. Split `Environ` interface into smaller chunks; introduce `InstanceManager`, `Firewaller`
609 1. Smaller structs with common logic, e.g. port management what use provider specific call outs
610 1. Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct
611 1. Stop using many security groups - use default group with iptables
612 1. Use `LoadBalancer`? interface (needed by Azure); will provide open/close ports; most providers will not need this and/or return no-ops
613 1. `Firewaller` worker be made the sole process responsible for opening/closing ports on individual nodes
614 1. Refactor provider’s use of `MachineConfig` as means to pass in params for cloud init; consider ssh’ing into pristine image to do work as per manual provider?????
615
616 ## Availability Zones
617
618 - Users want to be able to place units in availability zones explicitly (provider-specific placement directives). The core framework is nearing completion; providers need to implement provider-specific placement directives on top.
619 - Users want highly-available services (Juju infrastructure and charms). On some clouds (Azure), spreading across zones is critical; on others it is just highly desirable.
620 - Optional: one nice feature of the Azure Availability Set implementation is automatic IP load balancing (no need for HA Proxy which itself becomes a SPoF). Should we support this in other providers (AWS ELB, OpenStack LBaaS, ...)?
621
622 ### Agenda
623
624 - Prioritise implementation across providers (e.g. OpenStack > MaaS > EC2?).
625 - Discuss overall HA story, IP load balancing.
626
627 Azure supports implicit load balancing but don’t care about other clouds for now.
628
629 ### Work Items
630
631 1. Determine which providers support zones; EC2, OpenStack, Azure?
632 1. Implement distribution group in all providers; either they do it or return an error.
633 1. New policy in state which handles units on existing machines.
634 1. New method on state which accepts distribution groups and list of candidate instance ids and returns a list of equal best candidates.
635 1. Add API call to AMZ to find availability zones.
636
637 ## Networks
638
639 - Juju needs to be aware of existing cloud-specific networks, so it can make them available to the user (e.g. to specify placement and connectivity requirements for services and machines, provide network capabilities for charms/relations, fine-tuning relations connectivity, etc.).
640 - Juju needs to treat containers and machines in an uniform way with regards to networks and connectivity (e.g. providing and updating addresses for machines and containers, including when nesting).
641 - Knowing the network topology and infrastructure in the cloud, juju can have a better model how services/machines interact and can provide user-facing tools to manage that model (CLI/API, constraints/placement directives, charm metadata) in on a high level, so that the user doesn’t need to know or care how lower level networking is configured.
642
643 ### Agenda
644
645 - Discuss and outline the high-level architecture integrating existing MaaS VLAN MVP work and instances addresses, so that we have a unified networking/addressability model.
646 - Prioritize implementation across providers.
647 - Discuss and define required features and deadlines?
648
649 ### Meeting Notes
650
651 - We need networks per service -> then configure them on machines.
652 - Default networks get created (public/private)?
653 - Networks per relation -> routing between netdb (mysql) /netapp (wp) e.g.
654 - network relations to define routing ? add-net-relation netdb netapp; then when add-relation mysql wordpress [--using=netrel1] (if more than one)
655 - Container addressability
656
657 ## Networking - Connections vs Relations
658
659 Discussion of specifics of networking routing.
660
661 - Relations do not always imply connections (although usually they do, except when they don’t like with proxy charms).
662 - Juju wants to model the physical connections to open ports/iptables/securitygroups/firewalls appropriately to allow the relation’s actual traffic.
663 - We need to be able to specify the endpoints for communication within charm hooks if it’s not the default model. Possible hook commands for that:
664 - `enable-traffic endpoint_ip_address port_range`
665 - For example: `enable-traffic 194.123.45.6 1770-2000`
666 - `disable-traffic ep port_range`
667 - Also talk to OpenStack charmers about non relation TCP traffic.
668 - Should Juju model routing rules and tables for networks? (Directly via API/CLI or implicitly as part of other commands, like add-relation between services on different networks).
669
670 ## Deployer into Juju Core
671
672 - To embed the GUI we need a solid path for making bundles work.
673 - You can’t juju deploy a bundle.
674 - Moving towards stacks Core should support bundles like charms, provide apis to the files inside, etc.
675 - Can GUI use the ECS to replace the functionality of the deployer for GUI needs?
676
677 The goal if the meeting is to verify that this is a logical path forward and create a plan to migrate to this. Stakeholders should agree on the needs in Core and make sure that it works with vs against future plans to expand on the idea of bundles into fat bundles and stacks.
678
679 ## Bundles to Stacks
680
681 What’s needed to turn bundles into stacks?
682
683 Bundles have no identity at run time, we want this for stacks. A namespace to identify the group of services that are under a bundle.
684
685 Drag a bundle to the GUI, you get a bunch of services, with stacks, drag and drop a stack and you get one identifiable stack icon that itself is a composable entity and logical unit.
686
687 - Namespaces
688 - The collection of deployed entities belong to a stack
689 - Bundles today ‘disappear’ once deployed (the services are available, but there is no visible difference from just doing the steps manually)
690 - Exposed endpoints
691 - Interface “http” on the stack is actually “http” on internal Wordpress
692 - Hierarchy (nesting)
693 - Default “status” output shows the collapsed stack, explicitly describing the stack shows the internal details
694
695 ### GUI concerns/thoughts
696
697 - Expanded stack takes over the canvas, other items not shown
698 - Drag on an “empty stack” which you can explode to edit, adding new services inside
699
700 ### Notes
701
702 - GUI can’t support bundles with local charms
703 - Bundles should become core entity supported by juju-core
704 - Deployer into juju-core should come after work for supporting uncommitted changes
705 - (dry run option?)
706
707 ### Stacks 2.0
708
709 Further items about what a stack becomes
710
711 - Incorporating Actions
712 - Describing behavior for Add-Unit of a stack
713
714 ### Work Items
715
716 Spend time to make a concrete Spec for next steps
717 for “namespacing” an initial implementation could just tag each item that is deployed with a name/UUID
718
719 ## Charm Store 2.0
720
721 - Access Control
722 - Replacing Charm World
723 - Ingesting Charms (for example w/ GitHub)
724 - Ingesting Bundles
725 - Search
726
727 Kapil’s aim: simplify current model of charm handling. Break three way link between launchpad, charmworld (deals with bundles, used via api by the gui), and the charmstore (deals in charms, used by juju-core state server). Question: is breaking the link between launchpad and charmworld the first step?
728
729 Lots of discussion over first steps, migrate charmworld api into store? Does the state server also need to implement it? Currently api is small but specific, search, pull specific file (maybe with some magic for icons) out of charms, some other things.
730
731 **First step**: Add feed from store that advertises charms. Change charmworld ingest to read from the store feed rather than launchpad directly.
732
733 **Second step**: Bundles are only in charmworld currently. Pulled from launchpad, are a branch with bundles.yaml, a readme, similar to a charm. Store needs to ingest bundles as well and also publish as a separate bundle feed. Change charmworld ingest to read store bundle feed.
734
735 **Third step**: Add v4 api that supercedes current charmworld v3 api, implemented in store. Cleaning up direct file access and other odd things at the same time. Remember that charm-tools are currently a consumer of v3 api.
736
737 We may want to split charm store out of juju-core codebase, along with packages such as charm in core to separate libraries.
738
739 After charmworld no longer talks to Launchpad it will be easier to provide ingestion from other sources, e.g. GitHub. Publishing directly to the store will be possible also.
740
741 Work item - bac - document existing charmworld API 3 (see [Charmworld API 3 Docs](http://charmworld.readthedocs.org/en/latest/api.html))
742
743 We’ll need to be able to serve individual files out of charms:
744
745 - `metadata.yaml`
746 - `icon.svg`
747 - `README`
748
749 Search capability could be provided by Mongo 2.6 fulltext search?
750
751 ### Questions
752
753 - How does ingestion of charm store charms for personal names space?
754 - `juju deploy cs:gh`
755 - Charm store 2.0 should be able to ingest not only from GitHub but from a specific branch in a GitHub repo (e.g. https://GitHub.com/charms/haproxy/tree/precise && https://GitHub.com/charms/haproxy/tree/trusty or a better example https://GitHub.com/charms/haproxy/tree/centos7) This is needed when there needs to be two different versions of a charm.
756 - As a best practice charms should endevour to have one charm per OS. When the divergence for a given charm is great enough (e.g. Ubuntu to CentOS) we should look at creating a new branch in git.
757
758 ## ACLs for Charms and Blobs
759
760 ### Work Items
761
762 1. Namespace that holds revisions for a charm needs to store ACLs.
763 1. Charm store needs to check them against API requests.
764 1. The API to get the resource need to have a reference to the top-level charm. TBD. So we can check the read permission.
765
766 Need to decide how we want to deal with access to metadata and content.
767 Should we always allow full access to all blobs and content if you can deploy,
768
769 ### Option #1
770
771 r=metadata
772 w=publish
773 x=deploy
774
775 #### Public charm (0755)
776
777 | | | Metadata | Publish | Deploy |
778 |------------|----------|----------|---------|--------|
779 | maintainer | charmers | X | X | X |
780 | installers | charmers | X | | X |
781 | everybody | - | X | | X |
782
783 #### Charm under test (0750)
784
785 | | | Metadata | Publish | Deploy |
786 |------------|----------|----------|---------|--------|
787 | maintainer | cmars | X | X | X |
788 | installers | qa | X | | X |
789 | everybody | - | | | |
790
791 #### Gated charm (0754)
792
793 You can see it, but you have to get approval (added to installers).
794
795 | | | Metadata | Publish | Deploy |
796 |------------|----------|----------|---------|--------|
797 | maintainer | ibm | X | X | X |
798 | installers | ibm-customers | X | | X |
799 | everybody | - | X | | |
800
801 ### Option #2
802
803 r=read content of charm
804 w=publish
805 x=deploy and read metadata
806
807 #### Public charm (0755)
808
809 | | | Content | Publish | Metadata & Deploy |
810 |------------|----------|----------|---------|--------|
811 | maintainer | charmers | X | X | X |
812 | installers | charmers | X | | X |
813 | everybody | - | X | | X |
814
815 #### Charm under test (0750)
816
817 | | | Content | Publish | Metadata & Deploy |
818 |------------|----------|----------|---------|--------|
819 | maintainer | cmars | X | X | X |
820 | installers | qa | X | | X |
821 | everybody | - | | | |
822
823 #### Gated charm (0710)
824
825 You can see it, but you have to get approval (added to installers).
826
827 | | | Content | Publish | Metadata & Deploy |
828 |------------|----------|----------|---------|--------|
829 | maintainer | ibm | X | X | X |
830 | installers | ibm-customers | | | X |
831 | everybody | - | | | |
832
833 #### Commercial charm with installer-inaccessable content (0711)
834
835 | | | Content | Publish | Metadata & Deploy |
836 |------------|----------|----------|---------|--------|
837 | maintainer | ibm | X | X | X |
838 | installers | ibm-customers | | | X |
839 | everybody | - | | | X |
840
841 ## Upgrades
842
843 Prior to 1.18, Juju did not really support upgrades. Each agent process listened to the agent-version global config value and restarted itself with a later version of its binary if required.
844
845 1.18 introduced the concept of upgrade steps, which allowed for ordered execution of business logic to perform changes associated with upgrading from X to Y to Z. 1.18 also made the machine agents on each node solely responsible for initiating an upgrade on that node, rather than all agents (machine, unit) acting independently. However, several pieces are still missing….
846
847 ### Agenda items
848
849 - Coordination of node upgrades - lockstep upgrades
850 - Schema updates to database
851 - HA - What needs to be done to support upgrades in a HA environment?
852 - Read only mode to prevent model or other changes during upgrades
853 - How to validate an upgrade prior to committing to it, e.g. bring up shadow Juju environment on upgraded model and validate first before either committing or switching back?
854 - Perhaps a `--dry-run` to show what would be done?
855 - Authentication/authorization - restrict upgrades to privileged users?
856 - How to deal with failed upgrades / rollbacks? Do we need application level transactions?
857 - Testing of upgrades using dev release - faking reported version to allow upgrade steps to be run etc
858
859 ### Work items for schema upgrade
860
861 Key assumption - database upgrades complete quickly
862
863 1. Implement schema upgrade code (probably as an upgrade step).
864 - mgo supports loading documents into maps, so we do not have to maintain legacy structs.
865 - Record “schema” version.
866 1. Implement state/mongo locking, with explicit upgrading/locked error.
867 - One form of locking is to just not allow external API connections until upgrade steps have completed, since we know we just restarted and dropped all connections.
868 1. Introduce retry attempts in API server around state calls.
869 1. Take copy of db prior to schema upgrade and copy back if it fails.
870 1. Upgrade steps for master state server only.
871 1. Coordination between master/slave state servers to allow master to finish first.
872
873 ### Work items for upgrade story
874
875 - Allow users to find out what version it will pick when upgrading.
876 - Commands to report that upgrade is in progress if run during an upgrade.
877 - Peer group worker to only start after an upgrade has completed.
878 - Update machine status during upgrade, set error status on failure.
879
880 ## Juju Integration with Oasis TOSCA standards (IBM)
881
882 [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) is a standard aimed at, “Enhancing the portability and management of cloud applications and services across their lifecycle.” In discussions with IBM we need to integrate Juju into TOSCA standards as part of our agreement. Thus we need to define the following:
883
884 - [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) - simple profile yaml doc, updated approx weekly
885 - Discuss who will lead this effort and engage with IBM.
886 - Define the correct integration points.
887 - Define design and architecture of TOSCA integration.
888 - Define what squad will deliver the work and timelines.
889
890 ### Goal
891
892 - Drag a TOSCA spec onto the juju-gui and have the deployment happen.
893
894 ## Other OS Workloads
895
896 Juju has been Ubuntu only so far but never intended to be only Ubuntu. We were waiting for user demand. It seems some of that demand has now happened. From earlier discussions the following areas have been identified for work:
897
898 1. Remove assumptions about the presence of apt from core
899 1. Deal with upstart vs systemv vs windows services init system differences for agents
900 1. Deal with rsyslog configuration
901 1. Define initial charms (bare minimum would be ubuntu charm equivalents)
902 1. Update cloud-init stuff for alternate OS
903 1. SSH Configuration
904 1. Define and handle non Ubuntu images
905
906 Key questions are:
907
908 1. Which is going to be first
909 - expect the windows workloads as that has been implemented already and we just need to integrate
910 1. How important is this compared to the other priorities?
911
912 I don’t think there are any questions around “should we do it”, just “when should we do it”.
913
914 ### CentOS / SLES
915
916 Hopefully we can handle both CentOS and SLES at one go as they are based on very similar systems. We may need to abstract out some parts, but on the whole, they *should* be very similar. Again there should be a lot of overlap between Ubuntu and both CoreOS and SLES, with obvious differences in agent startup management and software installation. The writing of the actual charms are outside the scope of this work, although we should probably make CentOS and SLES charms to mirror the ubuntu charm that just bring up an appropriate machine.
917
918 ### Windows
919
920 We have work that has been done already by a third party to get Juju working to deploy windows workloads. It is expected that this work that is done will not either cleanly merge with current trunk, nor necessarily meet our normal demands of tests, robustness or code quality. We won’t really know until we see the code. However what it does give us is something that works that clearly identifies all of the Ubuntu specific parts of the codebase, and will give us a good foundation to work from to get the workload platform agnostic nature we desire.
921
922 ### Notes
923
924 - Need to get code drop from MS guys.
925 - Use above to identify non Ubuntu specific parts of code.
926 - We do interface design, CentOS implementation.
927 - We hand the above back to MS guys and they use that as template to re-do the Windows version.
928 - Excludes state server running on Windows.
929 - Manual provisioning Windows instances.
930 - Local provider (virtual box) on Windows.
931
932 ## 3rd Party Provider Implementations
933
934 - Improving our documentation around what it takes to implement a Provider.
935 - We still call them Environ internally.
936
937 ## Container Addressability (Network Worker)
938
939 - [Earlier notes on Networking](https://docs.google.com/a/canonical.com/document/d/1Gu422BMAJDohIXqm6Vq4WTrtBV8hoFTTdXvXDQCs0Gs/edit)
940 - Link to [Juju Networking Part 1](https://docs.google.com/a/canonical.com/document/d/1UzJosV7M3hjRaro3ot7iPXFF9jGe2Rym4lJkeO90-Uo/edit#heading=h.a92u8jdqcrto) early notes
941 -What are the concrete steps towards getting containers addressable on clouds?
942 - Common
943 - Allocate an IP address for the container (provider specific).
944 - Change the NI that is being used to be bridged.
945 - Bring up the container on that bridged network and assign the local address.
946 - EC2
947 - **ACTION(spike)**: How do we get IP addresses allocated in VPC?
948 - Anything left to be done in goamz?
949 - OpenStack
950 - Neutron support in lp:goose.
951 - Add neutron package.
952 - Sane fallback when endpoints are not available in keystone (detect if Neutron endpoints are supported or not and if not report the error).
953 - New mock implementation (testservers).
954 - Specify ports/subnets at StartInstance time (possibly a spike as well).
955 - Add/remove subnets.
956 - Add/remove/associate ports (Neutron concept, similar to a NIC).
957 - Add/remove/relate bridges? Probably not needed for now.
958 - Maybe security groups via Neutron rather than Nova.
959 - Potential custom setup once port is attached on machine
960
961 We need a Networker worker at the machine level to manage networks. What about public addresses? We want `juju expose` to grow some ability to manage public addresses. Need to be aware that there’s a limit of 5 elastic IPs per region per account. Can instead get a public address assigned on machine startup that cannot be freely reassociated. Need to make a choice about default VPC vs creating a VPC. Using only default VPC is simpler.
962
963 ### Potentially out of scope for now
964
965 - Using non-default VPC - requires several additional setup steps for routes and such like.
966 - Networking on providers other than EC2/OpenStack, beyond making sure we don’t bork on interesting setups like Azure.
967 - Networking on cloud deployments that do not support Neutron (e.g. HP).
968
969 Separate discussion: Update ports model to include ranges and similar.
970
971 Switching to new networking model also enables much more restrictive firewalling, but does require some charm changes. If charms start declaring ports exposed on a private networks, it would be possible to skip address-per-machine for non-clashing ports. Also allows more restrictive internal network rules.
972
973 ### Rough Work Items
974
975 1. When adding a container to an existing machine, Environment Provisioner requests a new IP address for the machine, and records that address as belonging to the container.
976 1. `InstancePoller` needs to be updated, so that when it lists the addresses available for a machine, it is able to preserve the allocation of some addresses to the hosted containers.
977 1. `Networker` worker needs to be able to set up bridging on the primary instance network interface, and do the necessary ebtables/iptables rules to use the same bridge for LXC containers (e.g. any container can use one of the host instance’s allocated secondary IP addresses so it appears like another instance on the same subnet).
978 1. Existing MaaS cloudinit setup for VLANs will be moved inside the networker worker.
979 1. Networker watches machine network interfaces and brings them up/down as needed (e.g. doing dynamically what MaaS VLAN cloudinit scripts do now and more).
980
981 ## Leader Elections
982
983 Some charms need to elect a “master” unit that coordinates activity on the service. Also, Actions will at times need to be run only on the master unit of a service.
984
985 - How do we choose a leader?
986 - How do we read/write who the leader is?
987 - How do we recover if a leader fails?
988 - The current leader can relinquish leadership (e.g. this is a round robin use case).
989
990 Lease on leader status. Allows caching, prevents isolated leader from performing bad actions. If leader is running an action and can’t renew lease, must kill action. Same with hooks that require leader. Agent controls leader status, does the killing.
991
992 ## Improving charm developer experience
993
994 Charms are the most important part of Juju. Without charms people want to use, Juju is useless. We need to make it as easy as possible for developers outside Canonical to write charms.
995
996 Areas for improvement:
997
998 - Make charm writing easier.
999 - Make testing easier.
1000 - Make charm submission painless.
1001 - Make charm maintenance easier.
1002 - What are the current biggest pain points?
1003
1004 ## Juju needs a distributed log file
1005
1006 We currently are working on replicating rsyslog to all state servers when in HA. Per Mark Ramm, this is good enough for now. We may want to discuss a real distributed logging framework to help with observability, maintenance, etc.
1007
1008 ### Notes
1009
1010 - Kapil says Logstash or Heka. Heka is bigger and more complicated, and suggests Logstash is more likely to be suitable.
1011 - Wayne has used Apache Scribe in the past.
1012 - Requirements:
1013 - Replicated (consistently) across all state servers.
1014 - Newly added state servers must have old log messages available.
1015 - Must be tolerant of state server failures.
1016 - Store and forward.
1017 - Nice to have: efficient querying.
1018 - Nice to have: surrounding tooling for visualization, post-hoc analysis, …
1019 - Encrypted log traffic.
1020
1021 ### Actions
1022
1023 Juju actions are charm-defined functionality that is user-initiated and take parameters and are executed on units. Such as backing up mysql.
1024
1025 ### Open Questions
1026
1027 - How do we handle history and results?
1028 - How do we handle actions that require leaders on services with no leaders?
1029 - Is there anything else controversial in the spec?
1030 - Do we have a piece of configuration on the action defining what states it's valid to run it in?
1031 - Users should be made away of the lifecycle of an action. For example, what unit is currently backing up, the progress of the backup and the resolution of the backup if it was successful or not.
1032
1033 Actions have
1034
1035 1. State
1036 1. Lifecycle
1037 1. Reporting
1038
1039 Actions accept parameters.
1040 Actions directory at the top level: Contents a bunch of named executables.
1041 `actions.yaml` has a key for each action.
1042 E.g. service or unit action, e.g. schema for the parameters. (JSON schema expressed in YAML).
1043
1044 There are both unit-level and service-level actions. Unit-level will be done first.
1045
1046 Collections of requests and results.
1047 Each unit watches the actions collection for actions targeted to itself.
1048 Not notified of things they don't care about.
1049 When you create an action, you get a token, you watch for the token in the results table.
1050 Non-zero means failure. Error return from an action doesn't put the unit into an error state.
1051
1052 Actions need to work in more places than hooks. We don't want to run them before start or after stop. We want to run them while in an error state.
1053
1054 ```
1055 $ juju do action-name [unit-or-service-name] --config path/to/yaml.yml
1056 ```
1057
1058 By specifying a service name for a unit action, run against all units by default.
1059
1060 Results are yaml.
1061
1062 stdout -> log
1063
1064 Hook and action queues are distinct
1065
1066 ### Work Items
1067
1068 1. Charm changes:
1069 - Actions directory (like hooks, named executables).
1070 - Top-level actions.yaml (top-level key is actions, sub-keys include parameters, description).
1071 1. State / API server:
1072 - Add action request collection.
1073 - Add action result collection.
1074 - APIs for putting to action/result collections.
1075 - APIs for watching what request are relevant for a given unit.
1076 - APIs for watching results coming in (probably filtered by what unit/units we're interested in).
1077 - APIs for listing and getting individual results by token.
1078 - APIs for getting the next queued action.
1079 1. Unit agent work:
1080 - Unit agent's "filter" must be extended to watch for relevant actions and deliver them to the uniter.
1081 - Various modes of the uniter need to watch that channel and invoke the actions.
1082 - Handwavy work around the hook context to make it capable of running actions and persisting results.
1083 - Hook tools:
1084 - Extract parameters from request.
1085 - Dump results back to database.
1086 - Error reporting.
1087 - Determine unit state?
1088 1. CLI work:
1089 - CLI needs a way to watch for results.
1090 - juju do sync mode
1091 - juju do async mode
1092 - juju run becomes trivially implementable as an action
1093 1. API for listing action history.
1094 1. Leader should be able to run actions on its peers (use case: rolling upgrades).
1095 1. Later: Fix up the schema for charm config to match actions.
1096
1097 ## Actions, Triggers and Status
1098
1099 What are triggers? (related to Actions, IIRC)
1100
1101 ### Potential applications
1102
1103 - Less polling for UI, deployer, etc.
1104
1105 ### Topics to discuss
1106
1107 - Authentication
1108 - Filtering & other features
1109 - API
1110 - Implementation
1111
1112 ## Combine Unit agent and Machine agent into a single process
1113
1114 - What is the expected benefit?
1115 - Less moving parts, machine and unit agents upgrade at the same time.
1116 - Avoids N unit agents for N charms + subordinates (when hulk-smashing for example).
1117 - Less deployment size footprint (one less jujud binary).
1118 - Less workers to run, less API connections.
1119 - What is the expected cost?
1120 - rsyslog tagging (logs from the UA arrive with the agent’s tag; we need to keep that for observability).
1121 - Concrete steps to make the changes.
1122
1123 Issues with image based deployments?
1124
1125 - No issues expected.
1126 - Even we need a juju component inside the container, no issue.
1127
1128 ### Work Items
1129
1130 1. Move relevant unit agent jobs into machine agent (drop duplicates).
1131 1. Remove redundant upgrade code.
1132 1. Change deployer to start new uniter worker inside single agent.
1133 1. Change logging (loggo/rsyslog worker) to allow tags to be specified when logging so that each unit still logs with its own tag.
1134 1. (Eventually) consolidate previously separate unit/machine agent directories into single dir.
1135 1. ensure juju-run works as before
1136
1137 ## Backup/Restore
1138
1139 - Making current state work:
1140 - We need to have the mongo client for restore.
1141 - We need to ignore replicaset.
1142 - What will it take to implement a “proper” backup, instead of just having some scripts that mostly seemed to work one time.
1143 - Back-up is an API call
1144 - Restore should grow in `jujud`.
1145 - Add a restore to the level of bootstrap?
1146 - Turning our existing juju-backup plugin from being a plugin into being integrated core functionality.
1147 - Can we snapshot the database without stopping it?
1148 - How will this interact with HA? We should be able to ask a secondary to save the data.
1149 - It is possible to mongodump a running process, did we consider that rather than shutting mongo down each time?
1150 - Since we now always use --replicaSet even when we have only 1, what if we just always created a “for-backup” replica that exists on machine-0. Potentially brought up on demand, brought up to date, and then used for backup sync.
1151 - juju-restore
1152 - What are the assumptions we can reliably make about the system under restore?
1153 - E.g., in theory we can assume all members of the replica are dead, otherwise you wouldn’t be using restore, you would just be calling enusre-availability again.
1154 - Can we spec out what could be done if the backup is “old” relative to the current environment? Likely most of this is “restore 3.0” but we could at least consider how to get agents to register their information with a new master.
1155
1156 ### Concrete Work Items
1157
1158 1. Backup as a new Facade for client operations.
1159 1. `Backup.Backup` as an API call which does the backup and stages the backup content on server disk. API returns a URL that can be used to fetch the actual content.
1160 1. `Backup.ListBackups` to get the list of tarballs on disk.
1161 1. `Backup.DeleteBackups` to clean out a list of tarballs.
1162 1. HTTP Mux for fetching backup content.
1163 1. Juju CLI for
1164 - `juju backup` (request a backup, fetch the backup locally)
1165
1166 ## Consumer relation hooks run before provider relation hooks
1167
1168 [Bug 1300187](https://bugs.launchpad.net/juju-core/+bug/1300187)
1169
1170 - IIRC, William had a patch which made the code prefer to run the provider side of hooks first, but did not actually enforce it strictly. Does that help, or are charms still going to need to do all the same work.
1171 - Does it at least raise the frequency with which charms “Just Work” or does it make it hard to diagnose when they “Just Fail”.
1172
1173 ## Using Cloud Metadata to describe Instance Types
1174
1175 We currently hard-code EC2 instance types in big maps inside of juju-core. When EC2 changes prices, or introduces a new type, we have to recompile juju-core to support it. Instead, we should be able to read the information from some other source (such as published on streams.canonical.com since AMZ doesn’t seem to publish easily consumable data).
1176
1177 - OpenStack provider already reads the data out of keystone, are we sure AMZ doesn’t provide this somewhere.
1178 - Define a URL that we could read, and a process for keeping it updated.
1179
1180 ### Work Items
1181
1182 1. Investigate the instance type information each cloud type has available - both programmatically and elsewhere.
1183 1. Define abstraction for retrieving this information. Some clouds will offer this information directly, others will need to get this from simplestreams. Some cloud types may involve getting the information from mixed sources.
1184 1. Support search path for locating instance information and mixed sources.
1185 1. Ensure process for updating Canonical hosted information is in place.
1186 1. Document how to update instance type information for all cloud types.
1187 1. API for listing instance types (for GUI).
1188
1189 ## API Versioning
1190
1191 We’ve wanted to add this for a long time.
1192
1193 - Possible [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit#heading=h.avfqvqaaprn0) for refactoring API into many Facades
1194 - [14.04 Spec](https://docs.google.com/a/canonical.com/document/d/12SFO23hkx4sTD8he61Y47_kBJ3H5bF2KOwrFFU_Os9M/edit)
1195 - Can we do it and remain 2.x compatible for the lifetime of Trusty?
1196 - Concrete design around what it will look like.
1197 - From an APIServer perspective (how do we expose multiple versions).
1198 - From an API Client perspective.
1199 - From the Juju code itself (how does it notice it wants version X but can only get Y so it needs to go into compatibility mode, is this fine grained on a single API call, or is this coarse grained around the whole API, or middle ground of a Facade).
1200
1201 ### Discussion
1202
1203 - We can use the string we pass in now ("") to each Facade, and start passing in a version number.
1204 - Login can return the list of known Facades and what version ranges are supported for each Facade.
1205 - Login could also start returning the environment UUID that you are currently connected to.
1206 - With that information, each client-side Facade tracks the best version it can use, which it then passes into all `Call()` methods.
1207 - Compatibility code uses `Facade.CurrentVersion()` to do an if/then/switch based on active version and do whatever compatibility code is necessary.
1208
1209 ### Alternatives
1210
1211 - Login doesn’t return the versions, but instead when you do a `Call(Facade, VX)` it can return an error that indicates what actual versions are available.
1212 - Avoids changing Login.
1213 - Adds a round-trip whenever you are actually in compatibility mode.
1214 - Creates clumsy code around: `if Facade.Version < X { do compat} else { err : =tryLatest; if err == IsTooOld {compat}}`
1215 - Login sets a global version for all facades.
1216 - Seems a bit to coarse grained that any change to any api requires a global version bump (version number churn).
1217 - Each actual API is individually versioned.
1218 - Seems to fine grained, and makes it difficult to figure out what version needs to be passed when (and then deciding when you need to go into compat mode).
1219
1220 ## Tech-debt around creating new api clients from Facades
1221
1222 [Bug 1300637](https://bugs.launchpad.net/juju-core/+bug/1300637)
1223
1224 - Server side [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit).
1225 - We talked about wanting to split up Client into multiple Facades. How do we get there, what does the client-side code look like
1226 - We originally had just `NewAPIClientFromName`, and Client was a giant Facade with all functions available
1227 - We tried to break up the one-big-facade into a few smaller ones that would let us cluster functionality and make it clearer what things belonged together. (`NewKeyManagerClient`).
1228 - There was pushback on the proliferation of lots of New*Client functions. One option is that everything starts from `NewAPIClientFromName()`, which then gets a `NewKeyManager(apiclient)`.
1229
1230 ## Cross Environment Relations
1231
1232 We’ve talked a few times about the desirability of being able to reason about a service that is “over there”, managed in some other environment.
1233
1234 - Last [spec](https://docs.google.com/a/canonical.com/document/d/1PpaYWvVwdF55-pvamGwGP23_vHrmFwCW8Bi-4VUg-u4/edit)
1235 - Describes the use cases, confirm that they are still valid.
1236 - We should update to include the actual user-level commands that would be executed and what artifacts we would expect (e.g., `juju expose-service-relation` creates a `.jenv/.dat/.???` that can be used with `juju add-relation --from XXX.dat`).
1237
1238 ### Notes
1239
1240 Expose endpoint in env 1, this generates a jenv (authentication info for env1) that you can import-endpoint into another environment. This has env2 connects to env1, asks for information about the service in env1. This creates a ghost service in env2 that exposes a single endpoint, which is only available for connecting relations (no config editing etc). There is a continuous connection between the two environments to watch whether the service goes down, etc.
1241 Propagate IP changes to other environment. Note that it is currently broken for relations even in a single environment.
1242 Cross environment relations always use public addresses (at least to start).
1243 Note that the ghost service name may be the same as an existing service name, and we have to ensure that’s ok.
1244
1245 ## Identity & Role-Based Access Controls
1246
1247 - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm)
1248 - [Establishing User Identity](https://docs.google.com/a/canonical.com/document/d/150GEG_mDnWf6QTMc1kBvw_x_Y_whGVN19mr3Ocv6ELg/edit#heading=h.aza0s6fmxfs9)
1249
1250 ### Current Status
1251
1252 - Concept of service ownership in core.
1253 - Add/remove user, add-environment framework done, not exposed in CLI.
1254
1255 What does a minimum viable multi-user Juju look like? (Just in terms of ownership, not ACLs).
1256
1257 - `add-user`
1258 - `remove-user`
1259 - `add-environment`
1260 - `whoami`
1261
1262 ### 14.07 (3mo)
1263
1264 - Beginnings of role-based access controls on users (Implementation of RBAC in core is another topic).
1265 - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm).
1266 - Non-superusers: read-only access at a minimum.
1267
1268 ### 14.10 (6mo)
1269
1270 - Command-line & GUI identity provider integrations.
1271
1272 ### 15.01 (9mo)
1273
1274 - IaaS, mutually-trusted identities across enterprises.
1275 - Need a way to securely broker B2B IaaS-like transactions.
1276
1277 ## Iron Clad Test Suite
1278
1279 The Juju unit test suite is beset by intermittent failures, caused by a number of issues:
1280
1281 - Mongo and/or replica set related races.
1282 - Access to external URLs e.g. charm store.
1283 - Isolation issues such that one failure cascades to cause other tests to fail.
1284
1285 There are also other systemic implementation issues which cause fragility, code duplication, and maintainability problems:
1286
1287 - Lack of fixtures to set up tools and metadata (possibly charms?).
1288 - Code duplication due to lack of fixtures.
1289 - Issues with defining tools/version series such that tests and/or Juju itself can fail when run on Ubuntu with different series.
1290
1291 Related but not a reliability issue is the speed at which the tests run e.g. the Joyent tests take up to 10 minutes. We also have tests which were set up to run against live cloud deployments but which in practice are never run - we now rely on CI.
1292
1293 Over the last cycle, things have improved, and there are certain issues external to Juju (like mongo) which contribute to the problems. But we are not there yet and must absolutely get to the stage where tests pass first time, every time on the bot and when run locally. We need to consider/discuss/agree on:
1294
1295 - Identify current failure modes.
1296 - Harden test suite to deal with external failures, fix juju-core issues.
1297 - Introduce fixtures for things like tools and metadata setup and refactor duplicate code and set up.
1298 - Document fixtures and other test best practices.
1299
1300 ### Work Items - Core - Refactoring and Hardening
1301
1302 Juju does what it is supposed to do, but has a number of rough edges when it comes to various non-functional requirements which contribute to the fact that often Juju doesn’t Just Work, and many times requires an unacceptably high level of user expertise to get things right. These non-functional issues can very broadly be classified as:
1303
1304 - **Robustness** - Juju needs to get better at dealing with underlying issues, whether transient network related, provider/cloud related, or user input.
1305 - **Observability** - Juju needs to be less of a black box, and expose more of what’s going on under the covers, so that both humans and machine alike can make informed decisions in response to errors and system status.
1306 - **Usability** - Juju needs to provide a UI and workflow that makes it difficult to make mistakes in the first place; to catch and report errors early as close to the source as possible.
1307
1308 As well as changes to the code itself, we should consider process changes which will guide how new features are implemented and rolled out. There is currently a disconnect between developers and users (real world). A developer will often test a new feature in isolation on a single cloud which works first time, deployed on an environment with a few.
1309
1310 - Rename `LoggingSuite` to something else, make the default base suite with mocked out `$HOME`, etc.
1311 - Identify independent fixtures (e.g. fake home, fake networking, …), and compose base suite from them.
1312 - Create fake networking fixture that replaces the default HTTP client with something that rejects attempts to connect to non-localhost addresses.
1313 - Update tools fixture and related tests.
1314 - Introduce in-memory mock mgo for testing independent of real mongo server.
1315 - Continue separation of api/apiserver in unit tests to enable better error checking.
1316 - Document current testing practices to avoid cargo culting of old practices, ensure document is kept up-to-date at code review time.
1317 - Update, speed-up Joyent tests (and all tests in general). Joyent tests currently take ~10mins, which far too long.
1318 - Suppress detailed simplestreams logging by default in (new) ToolsSuite by setting streams package logging level to INFO suite setup.
1319 - Delete live tests from juju-core.
1320
1321 Nodes at best. They won’t be exposed to the pain associated with / needed to diagnose and rectify faults etc since it’s often easier to destroy-environment and start again, or a new revision will have landed and CI will start all over again. More often than not, it’s the QA team who has to diagnose CI failures which are raised as bugs but with developers being spared the pain of the root cause analysis and any fixes often addressing a specific bug rather than a systemic, underlying issue.
1322
1323 ### Items to consider
1324
1325 - Architectural layers - what class of error should each layer handle and how should errors be propagated / handled upwards.
1326 - How to expose/wrap provider specific knowledge to core infrastructure so that such knowledge can be used to advantage?
1327 - Where’s the line between Juju responding to issues encountered vs informing and immediate feedback of problems but CI issues lack immediate visibility.
1328 - Close the loop between real world deployment and developers.
1329 - How to ensure teams take ownership of non-functional issues?
1330 - Tooling - targeted inspection of errors, decisions made by Juju, e.g. utilities exist to print where tools/image metadata comes from; is that sufficient, what else is needed?
1331 - Roadmap would be awesome to know what features to look for in upcoming releases (and waiting for user input.
1332 - Feature development - involve stakeholders/users (CTS?) more, at prototype stage and during functional testing?
1333 - Hhow best to expose developers to real world, so that necessary hardening work becomes as much of an itch scratch as it does a development chore.
1334 -C close the loop between CI and development - unit tests / landing bot provide flag specific features for additional functional testing).
1335
1336 ### Notes
1337
1338 - Mock workflow in a spec/doc, quick few paragraphs about a change or feature will look for a user facing standpoint.
1339 - Not all features require functional / UAT because of time constraints but still want to give CTS etc input to dev.
1340 - Wishlist: Send more developers out on customer sites to get real world experiences.
1341 - Much more involvement with IS as a customer.
1342 - More core devs need to write charms.
1343 - Debug log too spammy - but new incl/excl filters may help.
1344 - Debug hooks used a lot - considered powerful tool.
1345 - Debug hooks should be able to drop a user into a hook context when not in error state, e.g. `juju debug hooks unit/0 config-changed`.
1346 - Need more output in status to expose internals (Is my environment idle or busy?).
1347 - More immediate reporting to user of charm output as deploy happens, don’t want to wait 15 minutes to see final status.
1348 - Juju diagnose - post mortem tools <- already done via juju ready/unready, output vars etc
1349
1350 ### Work Items
1351
1352 [Juju Fixes](https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoQnpJ43nBkJdHhnV05NcmQ3Tm5yRnIwcTlYMTZEaEE&usp=sharing)
1353
1354 1. Design error propagation mechanism to be used across providers.
1355 1. Destroy Service --Force.
1356 1. Dry run tell user what version upgrade-juju will use.
1357 1. Inspect Relation Data.
1358 1. Address changes must propagate to relations.
1359 1. Use Security Group Per Service.
1360 1. Use Instance names/tags for machines.
1361 1. safe-provisioning-mode default.
1362 1. Bulk machine creation.
1363 1. Unit Ids must be unique.
1364
1365 ## Retry on API Failures
1366
1367 Really part of hardening. There are transient provider failures due to issues like exceeding allowable API invocation rate limits. Currently Juju will fail when such errors are encountered and consider the errors permanent, when it could retry and be successful next time. The OpenStack provider does this to a limited extent. A large part of the problem is that Juju is chatty and makes many individual API calls to the cloud. We currently have a facility to allow provisioning to be manually retried but need something more universal and automated.
1368
1369 ### Discussion Points
1370
1371 - Understanding what types of operation can produce transient errors. is it the same for all providers? what extra information is available to help with retry decision?
1372 - Common error class to encapsulate transient errors.
1373 - Algorithm to back off and retry.
1374 - To what extent can Juju design / implementation change to mitigate most common cause which is exceeding rate limits.
1375 - How to report / display retry status.
1376 - Manual intervention still required?
1377
1378 ### Work Items
1379
1380 1. Identify for for each provider which errors can be retried.
1381 1. Juju should handle retries.
1382 1. Above discussion points constitute the other work items.
1383 1. Audit juju to identify api optimisation opportunities.
1384
1385 ## Audit logs in Juju-core
1386
1387 The GUI needs to be able to query *something* for a persistent log of changes in the environment.
1388
1389 - What events are auditable ? hatch: only events that cause changes in the environment.
1390 - tTm: who changed something, what was changed, when was it changed, what was it changed from, and to, why they were allowed to do it (Will).
1391 - Hatch: it needs to be structured events, user, event, description, etc, NOT just a blob of text.
1392 - Voidspace: do we need a query api on top of this ? filter by machine, by user, by operation, etc
1393 - Audit log entries are not protected at a per row level. Viewing the audit log will require a specific permission.
1394 - Not all users of the GUI may be able to access the audit log.
1395 - Audit log entries may be truncated, truncation will require a high level of permissions.
1396 - ACTION: determine auditable events.
1397 - ACTION: determine where to store this data, and what events to audit.
1398 - Hatch: it doesn’t need to be streaming from the start, but it should be possible.
1399
1400 ### Work Items
1401
1402 1. Create a state API for writing to the audit log (in mongodb).
1403 1. Record attempt before API request is run.
1404 1. Record success/error after API request is run.
1405
1406 ## Staging uncommited changes
1407
1408 Hatch doesn’t want to do this in Javascript, because it is not web scale. He wants the API server to handling this staging.
1409
1410 Thumper says that SABDFL says they want to be able to do this on the CLI as well.
1411
1412 - Nate: if we need to allow this to work across GUI and CLI then we have to store this data in the state.
1413 - Nate: do we need N staging areas per environment ? Nate: No, that is crazy talk, just one per environment.
1414 - Thumper: then we’ll need a watcher.
1415 - ACTION: uncommitted changes are stored in the state as a single document, a big json blob
1416 - ACTION: we need a watcher on this document.
1417 - Voidspace: entries are appended to this document, this could lead to confusion if people are concurrently requesting unstaged changes.
1418 - Hazmat doesn’t think we should store this in the state.
1419 - ACTION: Mark Ramm/hazmat to talk to SABDFL about the difficulty of implementing this.
1420 - All: do we have to have a lock or mode to enable/disable staging mode ?
1421 - Hatch: now the GUI and the CLI have different stories, the former works in staging mode by default, and the latter always commits changes immediately.
1422 - ACTION: a change via the CLI would error if there are pending changes, you can the push changes into the log of work with a --stage flag. Ramm: alternative, we tell the customer that the change has been staged, and they will need to ‘commit’ changes.
1423 - ACTION: the CLI needs a ‘commit’ subcommand.
1424 - Undo is out of scope, but permissible in a future scope; tread carefully.
1425
1426 ### Discussion Thurs May 1
1427
1428 - Moved into the idea of having an ApplyDelta API that lets you build up a bunch of actions to be changed
1429 - These actions can then all be in pending state, and you do a final call to apply them.
1430 - The actual internal record of the actions to apply is actually a graph based on dependencies
1431 - This lets you “pick one” to apply without applying the rest of the delta
1432 - Internally, we would change the current API to act via “create delta, apply delta” operations.
1433 - When a delta is pending, calling the current API could act on the fact that there are pending operations.
1434 - Spelling is undefined, e.g.
1435 - `named := CreateDelta()`
1436 - `AddToDelta(named, operation)`
1437 - `ApplyDelta(named)`
1438 - `ApplyDelta(operations)`
1439 - If it is just the ability to apply a listed set of operations, we haven’t actually exposed a way to collaborate on defining those operations.
1440
1441 ## Observability
1442
1443 How to expose more of what Juju is doing to allow users to make informed decisions. Key interface point via `juju status`. Consider instance / unit, observability and transparency. e.g.. what does pending really mean? Is it still in provisioning at the provider layer, is machine agent running? Is the install hook running? Is the start hook running? We collapse all of that done to a single state. we should ideally just push the currently executing hook into status.
1444
1445 ### To discuss
1446
1447 - How to display error condition concisely but allowing for more information if required.
1448 - Insight into logs - is debug log enough? (now has filtering etc).
1449 - Feedback when running commands via CLI - often warnings are logged server side, how to expose to users; use of separate back channel?
1450 - Interactive commands? Get input to continue or try again or error/warning?
1451 - Consistency in logging - guidelines for verbosity levels, logging API calls etc
1452 - How to discover valid vocabularies for machine names, instance types etc?
1453 - How to inspect relation data?
1454 - Should output variables be recorded/logged?
1455 - Provide --dry-run option to see what Juju would do on upgrades.
1456 - Better insight into hook firing.
1457 - Ability to probe charms for health? (incl e.g. low disk space etc).
1458 - Event driven feedback.
1459 - Integration with SNMP systems? How to alert when issues arise?
1460
1461 ### Work Items
1462
1463 - `juju status <entity>` reveals more about that entity - get all output on context that is specified.
1464 - Add new unit state - healthy/unhealthy.
1465 - Instance names/tag for machines (workload that caused it deployed).
1466 - Specifically, when deploying a service or adding a unit that requires a machine to be added, the provisioner should be passed through a tag of the service name or similar to annotate the machine with on creation.
1467 - Inspect relation data.
1468 - implement output variables (needs spec).
1469 - `add-machine`, `add-unit` etc need to report what was added etc
1470 - API for vocab (inst type).
1471
1472 ## Usability
1473
1474 ### Covers a number of key points
1475
1476 - Discoverable - features should be easily discoverable via `juju help` etc.
1477 - Validate inputs - Juju should not accept input that causes breakage, and should fail early.
1478 - Error response - Juju should report errors with enough information to allow the user to determine the cause, and ideally should suggest a solution.
1479 - Key workflows should be coherent and concise.
1480 - Tooling / API support for key workflows.
1481
1482 ### Agenda
1483
1484 - Identify key points of interaction - bootstrap, service deployment etc.
1485 - Current pain points e.g.
1486 - Tools packaging for bootstrap for dev versions or private clouds?
1487 - Open close port range?
1488 - Security groups!
1489 - What else?
1490 - What’s missing? Tooling? The right APIs? Documentation? Training?
1491 - Frequency of pain points vs impact.
1492
1493 ### Concrete Work Items
1494
1495 1. Improve `juju help` to provide pointers to extra commands.
1496 1. Transactional config changes.
1497 1. Fix destroy bug (destroy must be run several times to work).
1498 - Find or file bug on lp
1499 1. When a machine fails, machine state in juju status displays error status with error reason.
1500 1. Document rationale in code comment.
1501 1. `juju destroy service --force`
1502 1. Range syntax for open/close ports.
1503 1. Safe mode provisioning becomes default.
1504 1. Garbage collect security groups.
1505
1506 ## Separation of business objects from persistence model
1507
1508 A widely accepted architectural model for service oriented applications has layers for:
1509
1510 - services
1511 - domain model
1512 - persistence
1513
1514 The domain model has entities which encapsulate the state of the key business abstractions e.g. service, unit, machine, charm etc. This is runtime state. The persistence layer models how entities from the domain model are save/retried to/from non-volatile storage - mongo, postgres etc. The persistence layer translates business concepts like queries and state representation to storage specific concepts. This separation is important in order to provide database independence but more importantly to stop layering violations and promote correct design and separations of concerns.
1515
1516 ### To discuss
1517
1518 - Break up of state package.
1519 - How to define and model business queries.
1520 - How to implement translation of domain <> persistence model.
1521
1522 ### Goals
1523
1524 - No mongo in business objects - database agnosticism.
1525 - Remove layering violations which lead to suboptimal model design.
1526 - Scalability via ability to implement pub/sub infrastructure on top of business model rather than persistence model; no more suckiSpng on mongo firehose.
1527
1528 ### Work Items
1529
1530 1. Spike to refactor a subset of the domain model (e.g. machines).
1531 1. Define and use patterns (e.g. “named query”) to abstract out database access further (in spike).
1532 1. Define and use patterns for mapping/transforming domain objects to persistence model.
1533 1. If possible, define and implement integration with pub/sub for change notification.
1534
1535 ## Juju Adoption Blockers
1536
1537 [Slides with talking points](https://docs.google.com/a/canonical.com/presentation/d/1jcJ93Npuo60Iyy0BGSNap1kekQNxiZ7rDBJfuxAv_Go/edit#slide=id.ge4adadaf_1_645)
1538
1539 ## Partnerships and Customer Engagement
1540
1541 - Juju GUI has been a tremendous help.
1542 - Sales team enabler, to quickly and easily show Juju.
1543 - Every customer/partner asks
1544 - Where can I get a list of all charms?
1545 - Where can I get a list of all available relations?
1546 - Where can I get a list of all available bundles?
1547 - Where can I get a list of all supported cloud providers?
1548 - What about HA? What happens if the bootstrap node goes away?
1549 - We need to start demonstrating this, ASAP!
1550 - What if one of the connected services goes away? What does Juju do?
1551 - So, great, I can use Juju to relate Nagios and monitor my service. But what does Juju do with that information? Can’t Juju tell if a service disappears?
1552 - Auto-scaling? Built in scalability is great, but manually increasing units is only so valuable.
1553 - What do you mean, there aren’t charms available for 14.04 LTS yet?
1554 - *Yada yada yada* Docker *yada yada yada*?
1555 - Our attempts to shift the burden of writing charms onto partners/customers have yielded minimal results.
1556 - Pivotal/Altoros around CloudFoundry
1557 - CloudFoundry is so complicated, Pivotal developed their own custom Juju-like tool (BOSH) to deploy it, and their own “artifact” based alternative to traditional Debian/Ubuntu packaging.
1558 - CloudFoundry charms (and bundles) have proven a bit too complex for newbie/novice charmers at Altoros to develop, at the pace and quality we require.
1559
1560 ## Juju 2.0 Config
1561
1562 - Define providers and accounts as a first class citizen.
1563 - Eventually remove environments.yaml in favor of the above account configuration and .jenv files
1564 - Change ‘juju bootstrap’ to take an account and --config=file/--option=”foo=var” for additional options.
1565 - `juju.conf` needs
1566 - simplestreams source for provider definitions, defaulting to https://streams.canonical.com/juju/providers.
1567 - A new stream type “providers” containing the environment descriptions for known clouds (e.g. hpcloud has auth_url:xyz, type:OpenStack, regions-available: a,b,c, default-region:a).
1568 - Juju itself no longer includes the information inside the ‘juju’ binary, but depends on that information from elsewhere.
1569 - Providers section.
1570 - Locally define the data that would otherwise come from above.
1571 - Accounts section.
1572 - Each account references a single provider.
1573 - Local overrides for environment details (overriding defaults set in provider).
1574
1575 ## Distributing juju-core in Ubuntu
1576
1577 Landscape has a stable release exception for their client, not a micro release exception. We fulfil the rules for this even better than landscape does, as we have basically no dependencies at all.
1578
1579 We can split juju the client from jujud the server, though this isn’t terribly useful for us outside of making distro people happy.
1580
1581 Landscape process has two reviews before code lands, we used to do this but changed. Didn’t seem to drop quality our end.
1582
1583 Could raise at a tech board meeting item to sort out stable release things.
1584
1585 Having to have separate source packages for client and server would be annoying but painful, could we have different policies for binary packages generated from the same source package?
1586
1587 Dynamic linking gripes are not imminently going to be solved by anyone.
1588
1589 Have meeting with foundations to resolve some unhappinesses.
1590
1591 ## Developer Documentation
1592
1593 - https://juju.ubuntu.com/dev/ - Developer Documentation.
1594 - There exists an automated process to pull the files from the doc directory in the juju-core source tree and process the markdown into html, and uploads it into the WordPress site.
1595 - Minimal topics needed
1596 - Architecture overview
1597 - API overview
1598 - Writing new API calls
1599 - What is in state (our persistent store - horrible name, I know)?
1600 - How the mgo transactions work?
1601 - How to write tests?
1602 - Base suites
1603 - Environment isolation
1604 - Patch variables and environment
1605 - Using gocheck (filter and verbose)
1606 - Table based tests vs. simple tests
1607 - Test should be small and obviously correct
1608 - Developer environment setup
1609 - How to run the tests?
1610 - `juju test <filter> --no-log (plugin)`
1611 - https://juju.ubuntu.com/install/ should say install juju-local
1612
1613 ## Tools, where are they stored, sync-tools vs bootstrap --source
1614
1615 - FindTools is called whenever tools are required, which searches all tools sources again.
1616 - When tools are located in the search path, they are copied to env storage and accessed from there when needed.
1617 - Find is only to be called once at well defined points : bootstrap and upgrade. the tools are fetched into env storage so that e.g. during upgrade tools are sourced from there.
1618 - Need tools catalog separate from simplestreams for locating tools in env storage.
1619 - Bootstrap and upgrade and sync-tools need --source.
1620
1621 As is the case now, if --source is not specified, an implicit upload-tools will be done.
1622
1623 ## Status - Summary vs Detailed
1624
1625 Status is spammy even on smallish environments. It’s completely unusable on mid sized and larger environments. Can we make it easier to read, or make another status that is more of a summary view?
1626
1627 ### Work Items
1628
1629 1. Identify items in status output that may break people’s scripts if changed or removed.
1630 1. Add flags:
1631 - `--verbose/-v`: total status, current output + HA + networking junk
1632 - `--summary`: human readable summary - not YAML (this is dependant on mini-plugin below)
1633 - “`--interesting`”: items that aren’t “normal” (e.g. agent state != “Started”)
1634 1. Write mini-plugin that takes human readable YAML and generates human readable output e.g. HTML.
1635 1. Use watcher to monitor status instead of polling juju status cmd.
1636 1. Extend filtering.
1637
1638 ## Relation Config
1639
1640 When adding a relation, we want to be able to specify configuration specific to that relation. In settings terms, this will be “service-relation-settings”. We need to set config for either end of the service. Settings data stored for relation as a whole.
1641
1642 The relation config schema is defined in charm’s `metadata.yaml`. Separate config for each end of the relation.
1643
1644 The config is specified using add-relation `config.yaml` via `--config` option.
1645
1646 New Juju command `relation-get-config [-r foo]` to get config from local side of the relation. If inside hook we don’t need -r.
1647
1648 New `juju set-relation config.yaml` which will cause relation-config-changed hook to run.
1649
1650 ### Work Items
1651
1652 1. New add relation `metadata.yaml` schema.
1653 1. Ability to store relation settings in mongo.
1654 1. Support for processing relation config in `add-relation`.
1655 1. `relation-get-config` command.
1656 1. `set-relation-config` command.
1657 1. `relation-config-changed` hook
1658
1659 ## Bulk Cloud API
1660
1661 The APIs we use to talk to cloud providers are too chatty e.g. individual calls to start machines, open individual ports.
1662
1663 When starting many instances, partition them into instances with same series/constraints/distribution group and ask provider to start each batch.
1664
1665 ### Work Items
1666
1667 1. Unfuck instance broker interfaces to allow bulk invocation.
1668 1. Rework provisioner.
1669 1. Change instance data so that it is fully populated and not just a wrapper around an instance id, causing more api calls to be required.
1670 1. Audit providers to identify where bulk api calls are not used.
1671 1. Start instances to return ids only, get extra info in bulk as required.
1672 1. Single shared instance state between environs (updated by worker).
1673 1. Refactor prechecker etc to use cache environ state - reduce `New()` environ calls.
1674 1. Stop using open/close ports and use iptables instead.
1675 1. Use single security group.
1676 1. Use firewaller interface in providers to allow azure to be handled.
1677 1. Drop firewall modes in ec2 provider.
1678 1. Support specifying port ranges not individual ports (e.g. charm metadata).
1679 1. For hook tools - open ports on network for a machine not a unit.
1680
1681 ## Tools Placement
1682
1683 - Allow storage of tools in the local enviroment.
1684 - Providing a catalog of the tools in the local environment.
1685 - Refactoring the current tools lookup to use the catalog.
1686 - Provide tools import utility to get new tools into the environment.
1687 - Upgrades to check tools catalog to ensure tools are available for all required series, arches etc.
1688 - Same model as for charms in state.
1689
1690 ## Juju Documentation
1691
1692 **William**: Write documentation while designing the feature, and give them to Nick etc. before writing code. This is the word of god.
1693
1694 **Nate**: Use changelog file in juju-core repo to log features and bugfixes with merge proposals.
1695
1696 **Nick & Jorge**: we’re just a couple people, juju core is 20 people now.
1697
1698 **Ian**: can’t require changelog per merge, since a single feature may be many many merges, which might have no user facing features.
1699
1700 This must actually happen or Jorge has permission to kill Nate.
1701
1702 Nate to get buy in from team leads.
1703
1704 # Charm Config Schema
1705
1706 Users find our limited set of types in config (String, Bool, Int, Float) limited, and have to do things like pickle lists as base64. See [bug](https://bugs.launchpad.net/juju-core/+bug/1231526) which largely covers this.
1707
1708 - Map existing YAML charm config descriptions into a JSON schema.
1709 - Extend existing YAML config to something that can be mapped well to JSON schema.
1710 - Currently have a config field in charm document.
1711 - Create a schema document that charm links to.
1712 - Upgrade step that takes existing config field and creates new document linked to charm.
1713 - Add support in `juju set` for new format.
1714 - Add flag to `juju get` to output new format.
1715
1716 New types we want: enums, lists, maps (keys as strings, values as whatever).
1717
1718 Open questions: how charms upgrade their own schema types - there’s existing pain here where for instance the OpenStack charms are stuck using “String” for a boolean value because they cannot safely upgrade type.
1719
1720 Pyjuju had magic handling for schlurping files, there’s a bug feature request for a ‘File’ type.
1721
1722 Note this work does not include constraint vocabularies. See Ian Booth for that work.
1723
1724 # Juju Solutions & QA
1725
1726 This is very dependent on which charm you are looking at. I assume there were particular things that came up in the Cloud Foundry work that need attention. We have been building up test infrastructure quite quickly, which is one part of helping improve quality -- but the biggest thing is growing communities around particular charms.
1727
1728 # Juju QA
1729
1730 ## CABS Reporting
1731
1732 The feature has stalled as goals and APIs churned.
1733
1734 1. What are the goals of reporting?
1735 1. What is the data format that cabs will provide for reporting?
1736 1. How do we display the reports?
1737
1738 ## Scorecard
1739
1740 The scorecard is a progress report to measure our activity and correlate it to our successes and failures. Most of the work is done by hand. Though most of the information gathering can be automated, it was the lowest priority for the Juju QA team. How much time will we save if we automate some or all of the information gathering?
1741
1742 Juju QA has scripted most of what it gathers for the score card. The data is entered by hand instead of added to tables and charts by an automated process. These are the kinds of data that the team knows how to gather:
1743
1744 1. Bugs reported, changed, or fixed.
1745 1. Branch commits.
1746 1. Time from report, to start to release of bugs and commits.
1747 1. Releases of milestones.
1748 1. Downloaded installers and release tarballs (packagers and homebrew).
1749 1. Installs of clients from PPAs.
1750 1. Downloads of tools from public streams.
1751
1752 ### Work Items
1753
1754 1. GUI
1755 1. Bundles deployed
1756 1. Charms deployed
1757 1. visiting to jujucharms.com and juju.ubuntu.com
1758 1. Quick-start downloads
1759 1. Number of releases
1760 1. Number of bugs
1761 1. Number of bugs closed
1762 1. Core
1763 1. Number of external contributors
1764 1. Number of fix committed
1765 1. Number running envs (charmstore is queried every 90 min for new charms)
1766 - Do we know which env the charm query was for?
1767 1. Client installs (from ppa, cloud archive trusty)
1768 1. Number of tools downloaded (from containers and streams.c.c)
1769 1. Add anonymous stat collection to juju to learn more
1770 1. Eco
1771 1. Number of canonical and non-canonical charm committers
1772 1. Number of people in #juju (and #juju-dev)
1773 1. Number of subscribers juju and juju-dev mailing lists
1774 1. NUmber of charms audited
1775 1. AskUbuntu Conversion (Questions Asked & Answered)
1776 1. Number of tests in charms
1777 1. QA
1778 1. Metric
1779 1. Days to bug triage
1780 1. CI tests run per week
1781 1. Number of solutions tested
1782 1. Number of clouds solution tested on
1783 1. Number of juju core releases
1784
1785 ## Charm Testing Reporting
1786
1787 Charm test reporting has faced obstructions from several causes. There are two central issues. One, reliable delivery of data to report, and two, completion of the reporting views.
1788
1789 1. Charm testing data formats change without notice.
1790 1. Charm testing uses unstable code that can break several times a day, preventing gathering and publication of data.
1791 1. Charm testing leaves machines behind.
1792 1. Charm testing can exceed resource limits in a cloud.
1793 1. Charm testing doesn’t support multiple series.
1794 1. Charm reports doesn’t show me a simple table of clouds a charm runs on.
1795 1. Most charms don’t have tests-- can we have a simple test to get every charm listed?
1796 1. I don’t know the version of the charm.
1797 1. I don’t know the last version that passed all tests.
1798 1. Charm details reports don’t show me the individual tests.
1799 1. I don’t know the series.
1800 1. I don’t know the version that last passed the individual test.
1801
1802 ### Work Items
1803
1804 1. Create a new jenkin that uses the last known good version of substrate dispatcher (lp:charmtester).
1805 1. Staging charmworld or something will trigger a test of a branch and revision.
1806 1. Provide charmers with a script to test the MP/pull requests.
1807 1. Provide a way to poll Lp an Gh to automatically run the tests for the MP/PR.
1808 1. Provide a way to test tip of each promulgated charm.
1809 1. Reporting needs to pick the data from the new test runner/jenkin.
1810 1. Overview should list every charm tested.
1811 1. Does the charm have tests?
1812 1. A link to the specific charm results.
1813 1. Which clouds were tested and did the suite pass?
1814 1. What version was tested?
1815 1. What is the last known-good version to pass the tests for a substrate.
1816 1. What version passed all substrates.
1817 1. For any charm, I need to see specific charm results.
1818 1. Which substrates were tested?
1819 1. The individual tests run in substrate, show name of the test and pass/fail.
1820 1. Need a link to see the fail log located somewhere.
1821 1. What was the last version of the charm to pass the test.
1822 1. Update substrate dispatcher or switch to bundle tester to to gather richer data.
1823 1. Ensure `destroy-environment`.
1824 1. Capture and store JSDON data instead logs.
1825 1. We will get use cases for the charm test reports that will verify the report meets expectations.
1826 1. Tests could state their needed resources and the test runner can look to see if they are available. The tests can be deferred until resources are available.
1827
1828 ## Charm testing with juju Core
1829
1830 1. We test with stable juju and charm.
1831 1. We could test with unstable.
1832 1. Only test the popular charms for each revision.
1833 1. Or only test charm with tests.
1834 1. Or test bundles which has valid combinations.
1835 1. Test all the charms occasionally.
1836 1. Historically when charms break with new juju, it is the charm’s fault.
1837
1838 ## Charm MP/Pull Gate on Charm Testing
1839
1840 Charm merges could be gated on a successful test run against the supported clouds.
1841
1842 - Allow charmers to manually request a test for a branch and revision.
1843 - Maybe extend the script to poll for pull requests/merge proposals.
1844 - Charm testing doesn’t support series testing yet.
1845
1846 ### Testing
1847
1848 1. Test MP or pull request.
1849 1. Test merge and commit on pass.
1850 1. Charm testing runs and is actually testing that juju or ubuntu still works for the charm.
1851
1852 ## CI Charm and Bundle Testing
1853
1854 Testing popular bundles with Juju unstable to ensure the charms and bundles continue to work.
1855
1856 1. Notify the charm maintainer or the juju developers when a break will happen.
1857 1. Can testing be automated to grow newly popular charms and bundles?
1858 1. There are resource limits per cloud.
1859
1860 ### Notes
1861
1862 - Charm testing could be simplified to proof and unit tests.
1863 - Bundle tests would test relations.
1864 - Current tests don’t exercise failures or show error recovery.
1865 - Ben suggests that amulet tests in charms are could be moved to bundles.
1866 - Charms are like libraries, bundles are like applications.
1867 - Bundles are known topologies that we can support an recommend.
1868 - Charm tests could pass, but break other apps; the bundle level where we want to test.
1869 - Workloads are more like bundles, though some charms might be not need to in a relation, so a bundle of one.
1870 - Config testing is valuable at the charm-level and bundle-level.
1871 - Integration suites might work on a charm or a bundle.
1872 - Cloud-foundry tests only work with the bundle...running the suite for each charm means we construct the bundle multiple times and rerun tests.
1873 - The charm author might right weak tests. Reviewer need to see this and respond. Bundles represent how users will use the charm, and that is what needs testing to verify utility and robustness.
1874 - Bundle tester has a test pyramif.
1875 - Proofing each charm.
1876 - Discovering unit testing in each charm.
1877 - Discovering integration tests and running them.
1878 - Bundle testing has a known set of resources...which is needed when testing in a cloud.
1879 - Bundle tests provide the requirements for any software’s own stress and function tests.
1880 - Charm reports would use the rich JSON data.
1881
1882 ### Work Items
1883
1884 1. Review BenS Bundle testing for integration into QA Jenkins workflow
1885 1. Get back to BenS with any questions.
1886 1. Use cases to drive what reports need to show.
1887 1. What do the different stakeholders need to discover reading the reports?
1888 1. What actions will stake holders take when reading the reports?
1889 1. Do bundle tests poll for changes to bundles or the charms they use?
1890 1. The alternate would be to test on demand.
1891 1. Gated merges of MP/PR mean there is little value in testing on push.
1892
1893 ## CI Ecosystem Tests
1894
1895 We want to extend Juju devel testing to verify that crucial ecosystems tools operate with it. When there is an error, the Juju-QA team will investigate and inform 1 or both owners of the issue that needs resolution.
1896
1897 The juju under test will be used with the other project’s test suite. A failure indicates Juju probably broke something, but maybe the other project was using juju in an unsupported way.
1898
1899 Juju CI will provide a simple functional test to demonstrate an example case works.
1900
1901 We want a prioritised list of tests to deliver.
1902
1903 1. Juju GUI
1904 1. Juju Quickstart
1905 1. Azure juju GUI dashboard
1906 1. jass.io
1907 1. Juju Deployer
1908 1. mojo
1909 1. amulet
1910 1. charm tools
1911 1. charm helpers
1912 1. charmworld
1913
1914 ### Work Items
1915
1916 1. Quickstart
1917 1. Quickstart relies on CLI and API, and config files. It waits for the GUI to come up in the env then deploy bundles.
1918 1. Quickstart opens a browser to show.
1919 1. Testing
1920 1. Install the proposed juju.
1921 1. Run juju-quickstart bundle to a bootstrapped env.
1922 1. Tries to colocate the bootstrap node and GUI when not local provider and the series and charm have the same series.
1923 1. Otherwise GUI is in a different container.
1924 1. `juju status` will list the charms from the bundle.
1925 1. Rerun juju-quickstart bundle.
1926 1. Verify the same env is running with eh same services.
1927 1. GUI team need to write
1928 1. Functional tests.
1929 1. Allow the tests to be run on lxc.
1930 1. Juju GUI charm
1931 1. “make test” will deploy the charm about 8 times.
1932 1. GUI is deployed on bootstrap node to make the test faster.
1933 1. If the provider is local gui should be in a different container.
1934 1. The charms has tests that are run by juju test.
1935 1. The functional tests run the default juju.
1936 1. We can use the juju under test with the charm.
1937 1. An env variable is used select the series for the charm.
1938 1. Test with a bundle implicitly tests deployer.
1939
1940 ## CI Cloud and Provider Testing
1941
1942 Juju CI tests deployments and upgrades from stable to release candidate. We might want additional tests.
1943
1944 1. Canonistack tests are disabled.
1945 1. Swift fails; IS suspect misconfiguration or bad name (rt 69317).
1946 1. Canonistack has bad days where no one can deploy.
1947 1. Restricted and closed networks?
1948 1. CI has a restricted network test that shows the documented sites and ports are correct, but it doesn’t verify tools retrieval.
1949 1. A closed network test would have proxies providing every documented requirement of Juju.
1950 1. Constraints?
1951 1. Placement?
1952 1. `add-machine`, `add-unit`?
1953 1. Health checks for by series?
1954
1955 ### Work Items
1956
1957 1. Placement tests are required for AWS and OpenStack.
1958 1. `add-machine` and `add-unit` can be functional tests.
1959 1. Need nova console log when we cannot ssh in.
1960 1. Constraints are mostly
1961 1. Unique
1962 1. Azure availability sets (together relationship)
1963 1. AWS/OpenStack availability zones (apart relationship)
1964 1. Security groups
1965 1. MaaS networks
1966
1967 ## CI Compatibility Function Testing
1968
1969 Juju CI has functional tests that exercise a function works across multiple versions of juju and when juju is working with multiple versions of itself.
1970
1971 1. Unstable to stable command line compatibility.
1972 1. Verify deprecation, not obsolescence.
1973 1. Verify scripted arguments do not break after an upgrade.
1974 1. 100% major.minor compatibility. Stable micro releases work with every combination?
1975 1. The means keeping a pool of stable packages for CI.
1976 1. Encourages creating new minor stables instead of adding test combinations; but SRU discourages minor releases.
1977 1. CI is **blocked** because Juju doesn’t allow anyone to specify the juju version to bootstrap the env with, nor can agent-metadata-url be set more than once to control the version found.
1978
1979 ### Work Items
1980
1981 1. Juju bootstraps with the same version as the client.
1982 1. Then juju upgrades/downgrades the other agents to the current version.
1983 1. Ubuntu wants 100% compatibility between the client in trusty and all the servers that trusty has ever had.
1984 1. If trusty had juju 1.18.0, 1.18.1, 1.20.0, we need to show that clients work with all the servers.
1985 1. We could parse the help and when an option disappears, we report bugs when options disappear. We need to see that commands and options are deprecated.
1986 1. We want to remove the deprecated features from the help to keep docs clean, but that makes deprecations look like obsolescence
1987 1. Client to server is command line to API server.
1988 1. Standup each server, the for each client check that they talk.
1989 1. We don’t need to repeat historic combinations.
1990 Test the new client with the old servers.
1991 1. Test the old clients with the new servers.
1992 1. The tests could be status, upgrade, and destroy, but if we had a API compatability check, we could quickly say the client and server are happy together
1993 1. Maybe split the juju package to have a juju-server and juju-client package. Trusty gets the new juju client package. The servers are in the clouds.
1994
1995 ## CI Feature Function Testing
1996
1997 Juju Command testing
1998
1999 1. Backup and restore (in progress).
2000 1. HA
2001 1. Charm hooks, relations, and expose, and upgrade-charm.
2002 1. Is the env setup for the hook.
2003 1. Do relations exchange info.
2004 1. Do expose/unexpose update ports?
2005 1. `upgrade-charm` downloads a charm and calls the upgrade hook.
2006 1. ssh, scp, and run.
2007 1. We claim gets the same env as a charm...we can test that the charm and run have the same env.
2008 1. set/get config and environment.
2009 1. Which options are not mutable?
2010
2011 ### Work Items
2012
2013 1. For every new feature we want to prepare a test that exercises it.
2014 1. Developer are interested in writing the tests with QA.
2015 1. Some tests may need to be run in several environments.
2016 1. Revise the docs about writing test and send them to developers.
2017 1. Add coverage for historic features.
2018 1. `add-machine` / `add-unit`
2019 1. set/unset/get of config and env
2020 1. ssh, scp, and run
2021 1. charm hooks, relations, and expose, unexpose, and upgrade-charm
2022 1. init
2023 1. `get-constraints`, `generate-config`
2024
2025 ## CI LTS (and other series and archs) Coverage
2026
2027 What is the right level of testing? Duplicate testing for each supported series may not be necessary. Unnecessary tests take time and limited cloud resources.
2028
2029 1. Can we test each series as an isolated case from clouds and providers?
2030 1. Must we duplicate every cloud-provider test to ensure juju on each series in each cloud works.
2031 1. Local provider seems to need a test for each series and juju.
2032 1. Unit tests pass on amd64.
2033 1. PPC64el is close to passing.
2034 1. i386 and arm64 are not making progress.
2035 1. Switch to golang 1.2.
2036
2037 ### Work Items
2038
2039 1. The default test series will be trusty; precise is an exceptional case.
2040 1. Golang will be 1.2.
2041 1. Golang 1.2 must be backported to precise and maybe saucy.
2042 1. If not, juju will have to abandon precise or only be 1.1.2 compatable.
2043 1. Build juju on the real archs or cross compile to create tools.
2044 1. Build juju on trusty amd64.
2045 1. Build juju on precise amd64.
2046 1. Build juju on trusty i386.
2047 1. ppc64+trusty will make gccgo-based juju.
2048 1. Need a machine to do arm64+trusty to make gccgo-based juju.
2049 1. Maybe CentOS.
2050 1. Maybe Win8 (agent for active server charm).
2051 1. Remove the 386 unit tests, replace it wil a 386 client test.
2052 1. Add tests for precise (where-as we had has special tests for trusty).
2053 1. Test a precise upgrade and deploy in one cloud.
2054 1. Test each series+arch combination for local provider to confirm packaging and dependencies.
2055 1. precise+amd64 local
2056 1. trusty+amd64 local
2057 1. utopic+amd64 local
2058 1. trusty+ppc64 local
2059 1. trusty+arm64 local
2060 1. Test client-server different series and arch to ensure the client’s series/arch does not influence the selection of tools.
2061 1. Utopic amd64 client bootstraps a trusty ppc64.
2062 1. We already test win juju client to juju precise amd64.
2063
2064 ## CI MaaS and vMaaS
2065
2066 Juju CI had MaaS access to for 3 days. The tests ran with success. How do we ensure juju always works with MaaS?
2067
2068 1. CI wants 5 nodes.
2069 1. CI wants the provider to be available at a moment's notice to run tests for the new revisions, just like all cloud are always available.
2070 1. CI probably does care if MaaS is in hardware or virtualised. No public clouds support vMaaS today.
2071
2072 ### Work Items
2073
2074 1. Ask Alexis, Mark R, and Robbie for mass hardware or access to stable MaaS env.
2075
2076 ## CI KVM
2077
2078 Juju CI has local-provider KVM tests, but they cannot be run. Engineers have run them on their own machines.
2079
2080 1. CI wants 3 containers.
2081 1. CI needs root access on real hardware (hence developers run on their machines).
2082 1. CI does care about hardware; no public clouds support KVM today?
2083
2084 ### Work Items
2085
2086 1. We can use the one of the 3 PPC machines.
2087 1. We need to setup a slave in the network.
2088 1. Ideally we can add a machine and deploy Jenkins slave to it.
2089 1. Or we standup a slave without juju.
2090 1. Or we change the scripts to copy the tests to the machine.
2091
2092 ## Juju in OIL
2093
2094 We think there may be interesting combinations to test. We know from bug reports that Juju didn’t support Havana’s multiple networks.
2095
2096 1. We want to know if Juju fails with new versions of OpenStack parts.
2097 1. We want to know if Juju fails with some combinations of OpenStack.
2098
2099 ## Vagrant
2100
2101 1. Run the virtual box image in a cloud.
2102 1. We care that the hosts mapping of dirs works with the image so that the charms are readable.
2103 1. Exercise the local deployment.
2104 1. Deploy of local must work.
2105 1. Failures might be
2106 1. Redirector of GUI failed.
2107 1. Packages in the image needed updating.
2108 1. lxc failed.
2109 1. Configuration of `env.yaml` might need changing.
2110 1. Command line deprecated or obsolete.
2111 1. When juju packaging deps change, the images need updating.
2112 1. May need to communicate with Ben Howard to change the image.
2113 1. Can CI pull images from a staging area to bless them?
2114 1. Can we place the next juju into the virtual env to verify next juju works.
2115
2116 ## Bug Triage and Planning
2117
2118 We have about 15 months of high bugs. Our planning cycles are 6 months. Though we are capable of fixing 400 bugs in this time, we know that 300 of the bugs are reported after planning. We, stakeholders, and customers need to know which bugs we intend to fix and those that will only be fixed by opportunity or assistance.
2119
2120 1. Do we lower the priority of the 150 bugs?
2121 1. Do we make them medium? Medium bugs are not more likely to be fixed then low bugs...opportunity doesn’t discriminate by importance. We could say medium bugs are the first bugs to be re-triaged when we plan.
2122 1. Do we make them low? Low bugs obviously mean we don’t intend to fix the issue soon. Is it harder to re-triage all low bugs?
2123 1. Do we create more milestones to organize work and show our intent? Can we plan work to be expedited instead of deferred?
2124 1. Target every bug we intend to address to a cycle milestone.
2125 1. Retarget some to major.minor milestones as we plan work.
2126 1. Retarget each to major.minor.micro milestones when branches merge.
2127 1. Triaging every bug. Juju-GUI, deployer, charm-tools and a few others often have untriaged bugs that are week old. Who is responsible for them? https://bugs.launchpad.net/juju-project/+bugs?field.status=New&orderby=targetname
2128
2129 ### Work Items
2130
2131 1. Want milestones that represent now, next stable, and cycle.
2132 1. Now is the next release for the 2 week cycle.
2133 1. Team target the bugs they want to fix the cycle.
2134 1. We can see it burn down
2135 1. Next stable are all the bugs we think define a stable release.
2136 1. This doesn’t burn down because most bugs are retargeted. Some bugs will remain as they are the final bugs fixed to stable.
2137 1. 3 stable releases per 6-month cycle.
2138 1. Do we want a next next?
2139 1. The cycle is 3 or 5 months that are all the high bugs we want to fix.
2140 1. We define stable milestones by pulling from the horizon milestone.
2141 1. Can we ensure there is a maximum capacity for the milestone? If you add a bug, you must remove the bug.
2142 1. Critical
2143 1. CI breaks. QA team will do first level of analysis.
2144 1. Regressions are critical, but we may be reclassified.
2145 1. Critical bugs need to be assigned.
2146 1. Flaky tests are High bugs in the current milestone.
2147 1. Alexis and stakeholders will drive some bugs to be added or moved forward.
2148 1. We have 15 months of high bugs
2149 1. To harden we need know which high bugs need fixing.
2150 1. We want to retriage all the high bugs and make most of them medium?
2151 1. Review the medium bugs regularly to promote them to high for the upcoming cycle or demote them to low.
2152 1. We want 75 bugs to be high at any one time (1 page of high bugs).
2153
2154 ## Documentation
2155
2156 We want documentation written for the release notes before the release. We need greater collaboration to:
2157
2158 1. Know which features are in a release.
2159 1. Know how the features work from the developer notes.
2160 1. Include the docs to the release notes.
2161 1. Developers review the release notes for errors.
2162 1. Adequately document features in advance of release where possible.
2163
2164 We also need to discuss how versioning of the docs is going to work moving forward, and how we will manage and maintain separate versions of the docs, e.g. 1.18, 1.20, dev (unstable).
2165
2166 ## MRE/SRU Juju into trusty
2167
2168 We want the current Juju to always be in trusty. We don’t like the cloud-archive because the current juju isn’t really in Ubuntu.
2169
2170 - Ubuntu wants guaranteed compatibility.
2171 - CI needs to ensure all versions of juju in a series work together.
2172 - Landscape has an exception to keep current in all supported series.
2173 - Landscape only puts the client in supported series.
2174 - The server is in the clouds.
2175 - The client is stable, it changes slowly compared to the server.
2176 - The client works with many versions of the server, but tends to be used with the matching server.
2177 - James Page suggests that juju be packaged with different names to permit co-installs. juju-1.20.0.
2178
2179 ## Juju package delivers all the goodness
2180
2181 1. apt-get install juju could provide juju-core, charm-tools, deployer.
2182
2183 ## juju-qa projects
2184
2185 1. Juju is moving to GitHub, Jerff and other canonical machines can only talk to Launchpad.
2186 1. The ci-cd-scripts2 must be on Launchpad.
2187 1. We must split the test branch from the juju project.
2188 1. We may want to split the release scripts from test scripts.
2189
2190 # Juju Solutions
2191
2192 ## Great Charm Audit of 2014
2193
2194 We've been doing an audit over the last couple of months -- and will continue. We've scaled up the Charmers team from 2 people 5 months ago, to 7 or 8 by Vegas, so we are adding a lot more firepower on this front -- but that's all still new. I expect to see significant increase on our charming capacity for the next cycle.
2195
2196 ## Pivotal Cloud Foundry Charms
2197
2198 Discussion points:
2199
2200 1. The pivot from packages to artifacts and why.
2201 1. Tarball of binaries for a given release.
2202 1. +1 on proceeding for orchestrating artifacts post Bosh build.
2203 1. Altoros, internal staffing, schedule.
2204 1. CF Service Brokers.
2205 1. Brief look at current status, juju canvas.
2206 1. What is demo-able by ODS?
2207
2208 ## IBM Workloads
2209
2210 ## ARM Workloads
2211
2212 ## CABS
2213
2214 ## Amtulet
2215
2216 - We want to know which charms are following an interface exchanged.
2217 - When an interface is exchanged this is the information is passed.
2218 - Then replay that.
2219 - This boils down to we need an interface specification.
2220 - Mock up interface relations.
2221 - Or figure out what the status is of the health check links.
2222 - An opportunity to call the hook in integration suites.
2223 - Could adopt some simplified version of Juju DB.
2224 - They are talking about schema for next cycle.
2225 - That probably isn’t the right answer.
2226 - Someone would need to take over maintainership from Kapil.
2227 - You need detailed knowledge of how Juju works.
2228 - We want to know which charms are following an interface exchanged.
2229 - When an interface is exchanged this is the information is passed.
2230 - Then replay that.
2231 - Build a quorum of what an interface looks like.
2232 - This is the relation sentry in amulet.
2233 - The problem with the relation sentry is the name is based on the
2234 - Hacking around a problem that can be solved with tools in core.
2235 - If core is not going to fix this, we need to hack round it.
2236 - Bundle testing or Unit testing.
2237 - Is this portion of a deployment reusable to other?
2238 - Depends on where we are going.
2239 - 100% Certain bundle testing is the way of the future.
2240 - Take some time writing a test and see how it would look.
2241 - What is really needed?
2242 - Do a single bundle test and see what that looks like it.
2243 - Looking at this with a fresh set of eyes, may show us new aspects
2244 - Once we go through the review of CI and see if we can.
2245
2246 ## Charm Tools
2247
2248 ## CharmWorld Lib
2249
2250 ## Charm Helpers
2251
2252 - Folks interested: Chuck, Marco, Ben, Cory
2253 - Break out contrib into charm helper contirib.
2254 - Define a way to deliver.
2255 - Where to I get it?
2256 - How do I use it?
2257 - What libraries are available?
2258 - Actions
2259 - Delivery via the install hook.
2260 - Document
2261 - Move as much as possible out of contrib to core.
2262 - Thursday May 1
2263 - Use doctest to ensure the documents are right.
2264 - doctest does not scale up very well.
2265 - Unit test docs before promotion to core
2266 - Move the things from outside of contrib and core into core
2267 - Use Wheel packaging it is a blob format (make dist).
2268 - Actually use and adhere to symantic versioning.
2269 - This may include changes to charm helpers sync to get the right version. Fuzzy logic to find different versions.
2270 - Chuck = Investigate the Altoris charm template for charm helpers
2271
2272 ## Java Bundle
2273
2274 ## HDP 2.0 Bundle
2275
2276 - Create 12 charms for GA release of Hadoop Apache that Hortonworks supports.
2277 - http://hortonworks.com/hdp/
2278 - Need to get communication from IBM on the porting of 12 components over to Power.
2279 - Need to identify which HDP version is going to be the released version.
2280 - 3.0 will most likely be the next GA release.
2281 - Need to support multi-language in the GUI.
2282 - Next milestone:
2283 - Hadoop Summit demo.
2284
2285 ## Big Data Roadmap
2286
2287 - Optimizations
2288 - File system via Juju through storage feature.
2289 -Image based Hadoop specific images.
2290 - Conferences
2291 - Hadoop Summit (June)
2292 - Strata NY
2293 - Demos
2294 - See how we can hook the Hadoop bundle into a charm framework bundle (e.g. Rails).
2295 - See how we can plug in multiple data sources.
2296 - Cancer, etc.
2297 - Feature requests
2298 - Ensure that for services that need different fault domains/availability sets.
2299 - This may be resolved with tagging in MaaS.
2300 - Tag fault domain 1 and fault domain 2.
2301 - This is exposed to juju via the GUI.
2302 - Have the GUI/Landscape show which machines are in a given zone.
2303 - Idea/need
2304 - We need to provide a means for Hadoop users to be able to put in their map-reduce java classes without having access to the admin portion of juju where hadoop is deployed.
2305 - The idea is to create a shim/relatoin/sub that provides a user level access to users to be able to add in their map-reduce jobs.
2306
2307 ## AmpLab Bundle
2308
2309 ## Juju Actions in Bundles
2310
2311 ## Charms in Git
2312
2313 ## Charms Series Saga
2314
2315 ## Fat Bundles and Caching Charms on Bootstrap Node
2316
2317 ## Fat Charms in Closed Environments
2318
2319 - Detect Ports calling to the outside network.
2320
2321 ## UA Charm Support Story
2322
2323 - Support bundles not charms.
2324 - CTS validates the bundle relations and config.
2325 - Has to have test.
2326 - Need to have bundles in the charms store mark that it is UA supportable.
2327
2328 ## How to engage Joyent & Altoros on provider support
2329
2330 ## Unstable Doc Branches & Markdown
2331
2332 ## Gating Charm merge proposals on charm testing passing
2333
2334 - Many useful relations.
2335 - I expect this is very charm specific -- please feel free to list relations that we need.
2336
2337 ## juju.ubuntu.ccom doc versioning
2338
2339 - Marco, Jorge, Curtis, Matthew
2340 - Branches will be versions in Git.
2341 - 1.18
2342 - en
2343 - fr
2344 - 1.20
2345 - en
2346 - fr
2347 - Branches will be versions in Git.
2348 - How to generate docs for live publishing.
2349 - Juju QA team will build the markdown to HTML conversion.
2350 - In this conversation the Juju QA team will also incorporate the languages and drop down for versioning.
2351 - Jorge to speak to the translations team on the best way forward.
2352 - When committing to docs master the reviewer should also commit to unstable docs.
2353 - Keep assets in a separate directory outside the versions and languages so we only have to updates one place for assets.
2354
2355 - Move author docs to a separate repository, but keep them in the nav for the live juju.ubuntu.com site.
2356 - The reason is that authoring docs should always be the current independent of the release. Charm authoring should work the same across all releases. Thus, we should always show the latest.
2357 - The main idea is to de-couple the charm author docs for the user docs as we always want to show the latest charm author docs (as charm authoring should always work the same across releases). This helps the scenario if we need to update the charm author docs we will need to update all the branches.
2358 - We will need to update the juju contributor docs once we move the charm author section.
2359
2360 ## Juju and OpenStack
2361
2362 - Juju in keystone - Juju as a multi-tenant component registered in keystone.
2363 - Juju in horizon - Juju gui and ui in horizon.
2364 - Juju in heat - Juju / Deployer/bundle style exposed as dsl in heat.
2365
2366 # Juju GUI
2367
2368 ## Juju in OpenStack Horizon - Juju GUI in horizon
2369
2370 ### Issues to resolve
2371
2372 - Embedding UI path? An OpenStack project or into an existing one.
2373 - Embedding UI as far as framing/styling.
2374 - Required timeframe, map out paths of resistance to make OpenStack release.
2375 - The guiserver (python/tornado) running in that stack.
2376 - No bundles without deployer access.
2377 - Build deployer into core?
2378 - Build a full JS deployer?
2379 - No local charms file content.
2380
2381 ## Juju in Azure - Juju GUI in Azure
2382
2383 ### Issues to resolve
2384
2385 - Embedding UI path? Hosted externally and referenced in? Need to meet specific Azure tooling requirements?
2386 - Embedding UI as far as framing/styling with existing Azure UX.
2387 - Additional required functionality.
2388 - List environments.
2389 - Required timeframe, map out paths of resistance to make deliverables.
2390 - The guiserver (python/tornado) running in that stack.
2391 - No bundles without deployer access.
2392 - Build deployer into core?
2393 - Build a full JS deployer?
2394 - No local charms file content.
2395
2396 ## Juju UI networks support
2397
2398 - Which types of networking supported and will be supported in core this cycle? Others planned to make sure design scales/works.
2399 - What does design have for UX of this so far?
2400 - Provider differences, sandbox, etc
2401 - Make sure api exposure is complete enough in core to aid all UI team needs put forth by design.
2402 - Get anything not onto someone’s schedule.
2403
2404 ## Juju UI Machine view 1.5
2405
2406 Most of this is a sync with design and check on what we put into 1.0 vs the final desired product.
2407
2408 - Deployed services inspector.
2409 - Better search integration.
2410 - Pre deployment config and visualization of bundles.
2411 - Better local charms integration.
2412 - Improved interactions (full drag/drop with the walkthrough/guide material).
2413
2414 ## Juju UI Design Global Actions
2415
2416 We’ve got a series of tasks on the list that require is to find a way to represent things across the entire environment. We need to sit down with design and look at a common pattern to use for these ‘global’ environment-wide tools, many of which mirror tasks at the service, machine, and unit level.
2417
2418 ### Items to discuss
2419
2420 - Design a home for global environment information.
2421 - HA status/make HA.
2422 - SSH Key management.
2423 - Environment level debug-log.
2424 - Environment level juju-run.
2425
2426 ## In the trenches - customer feedback for GUI
2427
2428 The GUI team would like to meet with ecosystems and others selling/deploying the GUI in the field and get feedback on things we can and should look at doing to make the GUI a better tool and product. The goal is to help prioritize and give us ideas of paper cuts we should schedule to fix during maintenance time in the next cycle.
2429
2430 ## Juju UI Product Priorities
2431
2432 There’s a backlog of features to add to the GUI. We need a product team opinion on which to prioritize as we work around bigger tasks like Azure embedding. We won’t be able to get all done this cycle so we’d like feedback on those most useful to selling/marketing Juju.
2433
2434 - Debug log
2435 - HA representation controls
2436 - Network support
2437 - Juju Run
2438 - Multiple Users
2439 - Fat bundles
2440 - juju-quickstart on OS X
2441 - juju-quickstart MaaS support
2442 - SSH Key management UI
2443
2444 ## Core Process Improvements
2445
2446 ### Documentation
2447
2448 - Ian - use launchpad to track what bugs are where and which are fixed.
2449 - Nate - an in-repo file is easier to keep track of, easier to verify during code reviews.
2450
2451 ### Standups
2452
2453 - Leads meet once a week.
2454 - Standups are squad standups.
2455 - William 1 on 1s with leads.
2456 - Team Leads email about team status.
2457
2458 ### Vetting Ideas on Juju-dev
2459
2460 - Send user feature description to juju-dev before working on features.
2461
2462 ### 2-Week Planning Cycle
2463
2464 - Dev release every 2 weeks.
2465
2466 ### Contributing to CI tests
2467
2468 - We should do that.
2469
2470 ### Move core to GitHub?
2471
2472 Needs to be scheduled and prioritized. Non-zero work to get it working (build bot, process, etc).
2473
2474 - Code migration
2475 - Code review
2476 - Landing process
2477 - Release process
2478 - CI
2479 - Documentation
2480 - Private projects (ask Mark Ramm)
2481
2482 ### Work Items
2483
2484 1. Code migration
2485 1. Do it all in one big migration.
2486 1. Namespace will be juju/core.
2487 1. Factor out others later.
2488 1. Disable GitHub bugtracker.
2489 1. Code review
2490 1. Aim to use native GitHub code review.
2491 1. Find out about diffs being able to be expanded (ok, done).
2492 1. Rebase before issuing pull request to allow single revision to be cherry picked (investigate to be sure).
2493 1. Branch setup
2494 1. Single trunk branch protected by bot.
2495 1. Landing process
2496 1. Check out Rick’s lander branch (juju Jenkins GitHub lander).
2497 1. Run GitHub Jenkins lander on Jenkins CI instance.
2498 1. Documentation
2499 1. Document entire process.
2500 1. CI
2501 1. Polling for new revisions.
2502 1. Building release tarball
2503