github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/contributions/juju-14-10-plans.md (about) 1 title: Juju 14.10 Plans 2 3 [TOC] 4 5 # Core 6 7 ## Multi-environment State Server 8 9 Multi-environment, multi-customer. 10 11 ### Use Cases 12 13 - Embedding in Azure - people can spin up an environment without paying for a state-server, quite a cost for people to start. 14 - Embedding in Horizon (OpenStack dashboard). 15 16 ### How do we start? 17 18 - Create multiple client users (some sort of api) aka User Management. 19 - create-environment (need environments), list environments. 20 - SelectEnvironment is called after Login, Login itself exposes the multi-environment API root. To avoid an extra round trip, Login can optionally pass in the EnvironmentUUID. 21 - Credentials need to move out of environment. 22 - Machine/user/etc (everything) documents gain an environment id (except users). 23 - API filters by environment id inherited from the SelectEnvironment. 24 - rsyslog needs to split based on tenant. 25 - provisioner/firewaller/other workers gets environment id, one task per environment. 26 - Consider doing with the accounts/environment separation from `environments.yaml` (Juju 2.0 conf). 27 - This is changing the DB representation so that we represent Environments referencing Accounts and pointing to Providers. 28 - It may be possible the EnvironConfig still collapses this into one-big-bag of config, but it should be possible to easily change your Provider Credentials for a given Account have that cascade to all of your environments. 29 30 ### Work Items 31 32 - State object gains an EnvironmentUUID attribute, all methods against that State object implicitly use that environment. 33 - Update state document objects (machine,unit,relation-scopes,etc) to include EnvironmentUUID. 34 - MultiState object 35 - Includes the Users and Environment collection. 36 - Used for initial Login to the API, and subsequent listing/selecting of environments. 37 - SelectEnvironment returns an API root like we have today, backed by a State object (like today) that includes the environment UUID. 38 - **Unclear**: how to preserve compatibility with clients that don’t pass the environment UUID. 39 - Desirable: being able to avoid the extra round trip for Login+SelectEnvironment for most commands that know ahead of time (`status`, `add-unit`,etc). 40 - Admin on State server gives you the global rights across all Environments. 41 - Environments collection. 42 - MultiState APIs 43 - `ListEnvironments` 44 - Needs to filter based on the roles available to the user in various environments. Should not return environments that you don’t have access to. 45 - `SelectEnvironment` 46 - `CreateEnvironment` 47 - `DestroyEnvironment` 48 - Logging 49 - TBD, regardless the mechanism, we need environment UUID recorded per log message, so we can filter it out again. 50 - In rsyslog it could be put into the prefix, or sharded into separate log files. 51 - Include the GUI for the environment on (in) the state server per environment. 52 53 ## HA 54 55 - Current Issues 56 - `debug-log` retrieves the log from one API server; so in case of an HA environment not all logs are retrieved. 57 - https://bugs.launchpad.net/juju-core/+bug/1310268 58 - What is missing? 59 - HA on local. 60 - Next steps 61 - Decrease count (3 -> 1, 5 -> 3). 62 - Scaling API separately from mongo. 63 64 ### Notes 65 66 Working on rsyslog. Working logging to multiple rsyslogd. This is ready to be reviewed. 67 68 Need to update the conf when machines added or removed. This needs to be done. 69 70 Possible problem: logs being very out of order (hours off). 71 72 **Bug**: Peergrouper log spam on local. 73 74 HA on local can’t work 100% because VMs can’t start new VMs, so only machine 0 can be a useful master state server. However, there are other tests that can be done with HA that would be useful on local HA. 75 76 It would be useful to be able to have the master state server be beefy and higher priority for master, and the non-masters be non-beefy, because the master has a ton more load than the non-masters. Right now, ensure availability is very broad and vague. It’s not tweakable. However, you can do it by bootstrapping with a big machine, change the constraints to smaller machines, then ensure availability. The only thing we would need to add is a way to give a state server a higher priority for becoming master. 77 78 Need better introspection to the status so that the GUI can better reflect what’s going on. Need GUI to be able to call ensure availability. 79 GUI needs to show state servers. 80 81 Restore process for HA is just restore one machine then call ensure availability. 82 83 ### GUI needs 84 85 - allwatcher needs to add fields for HA status changes. 86 - GUI needs to know what API Address to talk to, handle fallback when one goes away, and keep up to date to know who else to talk to. 87 - ensure-availability needs to return more status (actions triggered). 88 - How is HA enabled/displayed in GUI? what does machine view show? 89 - Can you deploy multiple Juju-GUI charms for HA of the GUI itself? 90 91 ### CI 92 93 1. shutdown the master node or temporarily cripple the network to verify HA resolves the returned master. 94 2. Test on local because local will be used in demonstrations. 95 3. If backup-restore is also being done, then a restore of the master is a new master; ensure-availability must be rerun. 96 97 ### Work Items 98 99 - **Bug**: Agent conf needs to store all addresses (hostports), not just private addresses. Needed for manual provider. 100 - **Bug**: Peergrouper log spam on local. 101 - Change mongo to write majority, this is a change per session. 102 - Change mongo to write WAL logs synchronously. 103 - Need docs about how to use ensure availability on how to remove a machine that died. (try to improve the actual user story for how this works). 104 - `juju bootstrap` && `juju ensure-availability` (should not try to create a replacement for machine-0). 105 - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down. 106 - Machine that was down, ensure-availability was run to replace it, when the machine comes back it should not have a vote and should not try to be another API server. 107 - “juju upgrade-juju” should coordinate between the API servers to enable DB schema updates (before rewriting the schema make sure all API servers are upgraded and then only the master API server performs the schema change). 108 - APIWorker on nodes with JujuManageEnvironment should only connect to the API Server on localhost. 109 - Determine how Backup works when in HA. 110 - Changes for GUI to expose HA status. 111 - Changes for GUI to monitor what the current API servers are (need the Watcher that other Agents use exposed on the Client Facade). 112 - `ensure-availability` needs to return more status (actions triggered) (EnsureAvailability API call should return the actions). 113 - Change mongo to write majority, this is a change per session. 114 - Change mongo to write WAL logs synchronously. 115 - Need docs about how to use ensure availability on how to remove a machine that died. 116 - `juju bootstrap && juju ensure-availability` 117 - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down. 118 119 ### Work items (stretch goals) 120 121 - Ability to reduce number of state servers. 122 - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers). 123 - Ability to set priority on a state server. 124 - Ability to reduce number of state servers. 125 - Autorecovery - bringing back machines that die (or just calling ensure availability again). 126 - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers). 127 128 ## State, status, charm reporting 129 130 Statuses like ‘started’ don’t have enough detail. We don’t know the true state of the system or a charm from status like started. 131 132 - s/ready/healthy and s/unready/unhealthy 133 - Add jujuc tools ready and unready (healthy, unhealthy). 134 - Ready takes no positional arguments. 135 - Unready takes a single positional argument that is a message that explains why. 136 - Charm authors choose the message they want to use. 137 - Both ready/unready, when called without other flags, apply to the unit that is running. 138 - The above also accept a relation flag, `-r <relation id>`, which applies the status to the specified relation. 139 - The status data for a unit keeps track of the ready status, expose in status. 140 - Implementation needs to be shared with allwatcher so gui gets to see the info. 141 - Implement a ready-check hook that will be called periodically if exists; units expected to update ready status to be reported when hook is called. 142 - The details states are sub-statuses of ‘started’. 143 - Possible granular statuses for units. 144 - provisioned 145 - installing (sub or pending) 146 - Juju will poll the ready-check hook for current state. Charms need to respond ready or unready. 147 - We might want a concise and summary of the status. GUI might want to show the concise and later show the summary. 148 - Status is already bloated. 149 - Can status be intelligent enough to only include the data needed? 150 - Can you subscribe to get updates for just the information you think is changing...subscribe to the allwatcher? 151 - `juju status --all` would be the current behavior. 152 - We would `start --all` being implicit, but depreciated. 153 - We will switch to a more terse format. 154 - The status “started” is not really ready. 155 - There may be other hooks that still need to run. 156 - Only the charm knows when the service is ready. 157 - When install completes, the status is implicitly “started”. 158 - The charm author can set install to return a message to mean it is unready. 159 - Authors want to know when a charm is blocked because it is waiting on a hook. 160 - We can solve 80% of the problem with some effort but a proper solution is a lot of work. 161 - It isn’t clear when one unit is still being debugged. 162 163 ### Work Items 164 165 1. Introduce granular statuses. 166 1. Implement filters/subscribers to retrieve granular status. 167 1. Unify status and all-watcher. 168 1. Switch status from --all to the concise form 169 - (?) know when the charm is stable, when there are no hooks queued 170 - (?) know when all services are stable 171 1. When deploying then adding debug-hooks, the later could set up a pinger for the service being deployed, which puts the service into debug as it comes up. 172 1. `juju retry` to restart the hooks because resolved is abused. 173 174 ## Error Handling 175 176 - JDFI. We have a package. Use it. 177 - We need to annotate with a line number and a stacktrace. 178 - We have type preservation. 179 - There are some agreement to change the names of some of the API. 180 - Add this as we needed. Switching all code to use it is stalling the production line. 181 - Reviewer will push back to use the new error handling. 182 183 ### Work Items 184 185 1. Extend juju errors package to annotate with file and line number. 186 1. Log the annotated stack trace. 187 1. Change the backend to use `errgo`. 188 1. We need a template (Dimiter’s example) of how to use error logging. 189 190 ## Image Based Workflows 191 192 Charms able to specify an image (maybe docker) with the addition of storage, storage dirs are passed into docker as it is launched. 193 194 Unit agent may run either inside or outside the docker container (not yet determined). 195 196 Machine agent would mount the storage, and the charm directory into the docker container when it starts. The hooks are executed the docker container. 197 198 Looking to make the docker support a first class citizen in Juju. 199 200 *“Juju incorporates docker for image based workflows”* 201 202 Maybe limited to ones based on ubuntu-cloud image (full OS container). 203 204 May well have a registry per CPC to make downloading images faster on that cloud. 205 206 Perhaps have docker file (instructions to build the image) into the charm. The registry that we look up needs to be configurable. 207 208 Offline install will require pulling images into a local registry. 209 210 ### Work Items 211 212 1. Unit agent inside container. 213 1. Image registry. 214 1. Charm metadata to describe the image and registry. 215 1. Deployer to understand docker, deployer inspects charm metadata to determine deployment method, traditional vs. docker. 216 1. A docker deployer needs to be written that can download the image from a registry, and start the container mounting the agent config, storage, charm dir, upstart script for unit agent (if unit agent inside). 217 1. Need docker work to execute hooks inside the container from the outside. 218 219 **Depends on storage 0.1 done first.** 220 221 ## Scalability 222 223 ### Items that need discussion in Vegas 224 225 - How do we scale to an environment with 15k active units? 226 - How do admin operations scale? 227 - How do we handle failing units? 228 - dump and re-create 229 - Interaction with storage definition. 230 - How do we make a “juju status” that can provide a summary without getting bogged down in repeated information 231 - How does relation/get/set change propagation scale? 232 - Where are the current bottlenecks when deploying say hadoop? 233 - Where are the current bottlenecks when deploying OpenStack? 234 - Pub/Sub 235 - What do we need to do here? 236 - Notes: 237 - We need a pub/sub for the watchers to help scale. 238 - Each watcher pub/subs on its own, move up one level? 239 - Need for respond to events that occur, in a non-coupled way (indirect sub to goroutine). 240 - Logging particular events? 241 - Only one thing looking at the transaction log, whoops, not as bad as we thought. 242 - 100k units, leads to millions of go-routines, blocking is an issue. 243 - If we do a pub/sub system, let’s use it everywhere? Replace watchers? 244 - Related to idea of pub/sub on output variables and the like it sounds like. 245 - Watching subfield granularity of a document perhaps? 246 - 0mq has this, should reuse that and not invent our own pub/sub. 247 - 0mq has Go bindings, wonder if it works in gccgo. 248 - Does this replace the api? No, can’t Javascript to 0mq directly so need some api-ness for clients. 249 - Are there alternatives to the watcher design? 250 - Really good for testing. Can decouple parts and make it easy/fast to test if the event is fired. 251 - Shared watcher for all things (On the service object?) 252 - Have a big copy of the world in memory, helps with a lot of this. 253 - Charm output variables watching, charm outputs, hits state, megawatcher catches and updates and tells everyone it’s changed. 254 - Helps with ABA problem using in memory model. 255 - Use 3rd party pub-sub rather than writing our own. 256 257 ### Work Items 258 259 1. Boot generic machines which then ask juju for identity info. 260 1. Bulk machine provisioning. 261 1. Fix uniter event storm due to “number units” changed events. 262 1. Implement proper pub sub system to replace watchers. 263 1. State server machine agent (APIServer) should not listen to outside connection requests until it itself (APIWorker) has started. 264 265 ## Determinism 266 267 ### First Issue: Install repeatability 268 269 There are two approaches to giving us better isolation from network and other externalities at deploy time. 270 271 1. Fix charms so they don’t have to rely on external resources. 272 - Perhaps by improvements around fat. 273 - REQUIRED: Remove internal Git usage (DONE). 274 - Perhaps by making it easy to manage those resources in Juju itself? 275 - Either create a TOSCA like “resources” catalog per environment: upload or fetch resources to the environment at deploy time (or as a pre-deploy step) 276 - or create a single central resource catalog with forwarding aka “gem store for the world”. 277 1. Snapshot based workflows for scale/up/down so external resources aren't hit on every new deploy. 278 - We could add the hooks necessary to core, but the actual orchestration of images, seems a bit more tricky and could depend on a better storage story. 279 280 ### Second Issue 281 282 From Kapil: “Runtime modification of any change results in non deterministic propagation across the topology that can lead to service interruption. Needs change barriers around many things but thats not implemented or available. e.g. config changed and upgrade for example executed at the same time by all units.”. 283 284 ### Upgrade Juju 285 286 `juju upgrade-juju` -> goes to magic revision (simple bug fix) that an operator can’t determine. 287 288 Juju internally lacks composable transactions, many actions violate semantic transaction boundaries and thus partial failure states leave inconsistencies. 289 290 Kapil notes: 291 292 > One of the issues with complex application topologies is how runtime changes ripple through the system. e.g. a config change on service propagates via relations to service b and then service c. It's eventually consistent and convergent, but during the convergence what's the status of the services within the topology. Is it operational? Is temporarily broken? 293 294 > **This is a hard problem** to solve and its one I've encountered in both our OpenStack and Cloud Foundry charms. 295 296 > In discussions with Ben during the Cloud Foundry sprint, the only mitigation we could think of on Juju's part was some form of barrier coordination around changes. e.g. that the ripple proceeds evenly through the system. It's not a panacea but it can help. Especially so looking at simpler cases of just doing barriers around `config-change` and `charm-upgrade`. What makes this a bit worse for Juju then other systems, is that we're purposefully encapsulating behavior in arbitrary languages and promoting blind/trust based reuse, so a charm user doesn't really know what effect setting any config value will have. e.g. the cases I encountered before were setting a 'shared-secret' value and and an 'ssl' enumeration value on respective service config... for the ssl i was able to audit that it was okay at runtime.. but thats a really subtle thing to test or detect or maintain. 297 298 > Any change can ripple through the topology. We have an eventually-consistent system, but while it is rippling, we have no idea. Lack of determinism means someone who uses Juju cannot make uptime guarantees 299 300 **Bug**: downgrading charms is not well supported. 301 302 ### Questions 303 304 - Do we need barriers? e.g. config-changed affects all units of a service simultaneously. 305 - Do we need pools of units within a service? 306 307 ### Work Items 308 309 - Unit-Ids must be unique (even after you've destroyed and re-created a service). 310 - Address changes must propagate to relations. 311 - `--dry-run` for `juju upgrade-juju`. 312 - `--dry-run` for deploy (what charm version and series am I going to get?). 313 314 ## Health Checks 315 316 Juju “status” reporting in charm needs to be clearly defined and expressive enough to cover a few critical use cases. It is important to note that BOSH has such a system. 317 318 - Canaries and rolling unit upgrades (health check as pre-requisite). 319 - Is a service actually running? 320 - Coordination of database schema upgrades with webserver unit upgrades (as an example of the general problem of coordinated upgrades). 321 - Determining when HA Quorum has been reached or a server has been degraded. 322 323 ### Questions 324 325 - We discussed Error, and Ready as states, but do we need a third? Pending, Error, and Ready? 326 - Do we need any more than three states? 327 - Suggestion: Three states, plus an error description JSON map. 328 329 ## Storage management 330 331 ### Allow charms to declare storage needs (block and mount). 332 333 - [Discussion from Capetown](https://docs.google.com/a/canonical.com/document/d/1akh53dDTROnd0wTjGjOrsEp-7CGorxVp2ErzMC_G-zg/edit) 334 - [Proposal post Capetown (MS) (lacks storage-sets)](https://docs.google.com/a/canonical.com/document/d/1OhaLiHMoGNFEmDJTiNGMluIlkFtzcYjezC8Yq4nAX3Q/edit#heading=h.wjxtdqqbl1fg) 335 336 Entity to be managed: 337 338 - CRUD, snapshot 339 340 Charms declare it: 341 342 - Path, type (ephemeral/persistent) block. 343 344 Storage 0.1: 345 346 - Storage set in state - track information in some way. 347 - Disks ( placement, storage). 348 - Provider APIs (to create, delete, attach storage, … expand for later). 349 - Provider to be able to attach storage to a machine. 350 - Charms need to be able to say in metadata. 351 - `jujud` commands to have charms be able to resolve where the storage is on the machine. 352 - Degradation, manual provider or other provider that doesn’t provide storage (DO), do not fail to deploy, but we need to communication warning of some form, CLI should fail? API will not? 353 354 Storage set, need to talk services, needs to be exposed as management processes. 355 356 Multitenant storage? Probably not for initial implementation, but ***do not design it out***. 357 358 Need to consider being able to map our existing storage policy onto the new design (e.g. AWS EBS volume for how Juju works with Amazon) 359 360 NOTE: Storage is tied to a zone, ops can take a long time to run. 361 362 Consider upgrades of charms, and how we can move from the existing state where a charm may have their own storage that they have handled, to the new world where we model the storage in state. 363 364 - (2) Add state storage document to charm document. 365 - Upgrading juju should detect services that have charms with storage requirements and fulfill them for new units. 366 - (6) Add state for storage entities attached to units. 367 - Lifecycle management for storage entities. 368 - (6) When deploying units, need to find out storage is needed. 369 - Make provisioner aware of workloads and include storage details when needed. 370 - Change unit assignment to machine based on storage restrictions. 371 - (4) Define provider apis for handling storage. 372 - Create new volume. 373 - Delete volume. 374 - Attach volume to instance. 375 - (12) Implement provider APIs for storage on different providers. 376 - OpenStack 377 - EC2 378 - MaaS 379 - Azure? 380 - (0) Consider storage provider APIs for compute providers that have storage as a service. 381 - (2) Define new `metadata.yaml` fields for dealing with storage. 382 - (0) Consider mapping between charm requirements and service-level restrictions on what storage should actually be provided. 383 - (4) Add storage to status. 384 - Top level storage entity keys. 385 - Units showing storage entities associated. 386 - Services show storage details. 387 - (4) CLI/API operations on storage entities. 388 - Add storage. 389 - Remove storage. 390 - Further operations? Resize? Not now. 391 392 ## Juju as a good auto-scaling toolkit 393 394 *Not a goal: Doing autoscaling in core.* 395 396 Goal: providing the API’s and features needed to easily write auto-scaling systems for specific workloads. 397 398 Outside stakeholders: Cloud Installer team. 399 400 We need to be able to clean up after ourselves automatically. 401 Where “clean up” actions are required, they need to take bulk operation commands. 402 403 - Destroy-service/destroy-unit should cascade to destroy dirty machines 404 405 ## IAAS scalability 406 407 - Security Group re-think: 408 - The security group approach needs to switch to per-service groups 409 - We need to support individual on-machine/container firewall rules 410 - Support for instance type and AZ locality 411 412 ## Idempotency 413 414 Is this a juju issue, or a charm issue? Config management tools always promise this, but rarely deliver -- though many deliver **more** than juju. What are the specific issues in question with Cloud Foundry? 415 416 ## Charm "resources" (fat bundles, local storage, external resource caching) 417 418 ### Problem Statements 419 420 - Writing and maintaining “fat” charms is difficult 421 - Forking charm executables to support multiple upstream release artifacts is sub-optimal 422 - Fat charms are problematic 423 - Non-fat charms are dependent on quite a few external resources for deployment 424 - Non-fat charms are not *necessarily* deterministic as to which version of the software will be installed (even to the point of sometimes deploying different versions in the same service) 425 426 ### Proposed Agenda 427 428 - Discuss making “fat charms better” 429 - Switch to a “resources” model, where a charm can declare the ‘external’ content that it depends on, and the store manages caching and replication of it 430 - Consider building on the work IS has done 431 - Choose a path, and enumerate all the work that needs to be done to fully solve this problem 432 433 ### Proposal 434 435 - ~`resource-get NAME` within a charm to pull down a published blob~ 436 - Instead using a model where charms request names, the charm overall declares the resources it uses, and the Uniter ensures that the data is available before firing the upgrade/install hooks. 437 - `resources.yaml` declares a list of streams that contain resources 438 439 ``` 440 default-stream: stable 441 streams: 442 stable: 443 devel: 444 common: 445 common.zip 446 amd64: 447 foobar.zip 448 ``` 449 450 - Rresources directory structure for charms should match those of the charm author so bind-mounting the directory for development still works in the deployed version of the directory structure, you will only have common and arch specific files. Should there be a symlink to specific arch? Either: 451 - publish charm errors if there are name collisions across common and arch specific directories. This way all the files are in a resources directory for the hook execution. This does mean that the charm developer needs a way to create symlinks in the top directory to the current set of resources they want to use (charm-tool resources-link amd64) - Windows? (they have symlinks right?) 452 - charm has resources/common, and resources/arch. “arch” is still a link, but just one. 453 - charm has resources/common, and resources/amd64 454 - this requires the hook knowing the arch 455 - charm identifiers become qualified with the stream name and resource version (precise/mysql-25.devel.325) 456 - juju status will show new version available if the entire version string (including resources) changes. 457 - if mysql-25.devel.325 is installed, and a different version of resources becomes current, this will be shown in “juju status” 458 - currently ask for mysql-latest, should perhaps be changed to mysql-current as we don’t necessarily want the latest version 459 - each named stream has an independent version, which is independent of both other streams and of the explicit charm version. 460 - upgrade-charm upgrades to the latest full version of the charm, including its resources 461 - upgrade-charm reports to the user what old version it was at and what the new version it upgraded to 462 - blobs are stored in the charm store, your environment always has a charm store, which can be synced for offline deployments 463 - today deploy ensures that the charm is available, copies it to environment storage, this will now need to do the same for the resources for the charm 464 - deploy should also confirm that the charm version and resource version are compatible 465 - `juju deploy mysql-25.dev.326` may fail because resources version 326 has a different manifest than declared in charm 25’s `resources.yaml` 466 - `juju deploy mysql` 467 - finds the current version of the charm and resources in the default stream 468 - the charm store has already validated that they match 469 - `juju deploy mysql-25` 470 - uses the default stream 471 - how do we determine the resources for this version 472 - Does current match? If yes, use it. 473 - If 25 < current, then look back from current resources and grab the first that has a matching manifest. 474 - Could just fail. 475 - Charm store could track the best current resources for any given charm version, as identified by moving the current resources pointer while keeping the charm pointer the same. For charm versions that are current, remember the current resources version. 476 - If we take this approach, there will be charm versions that have never been “current”, so deploying this without explicitly specifying the resources version will fail. 477 - `juju deploy mysql.nightly` (syntax TBD) 478 - `juju deploy mysql --option=stream=nightly` (hand wave - we don’t like this one as getting full version partly from config feels weird) 479 - Find the current version of mysql and the current version of the paid stream resources. 480 - So, the charm store needs to remember the current resources for each stream for each charm version for the current values. 481 - charm store has pointers for “current” version of charms and “current” version of resources 482 - charm store requires that the resources defined in the current pointers have the same shape (same list of files) 483 - `charm-publish` requires a local copy of all resources (for all architectures), and validates that `resources.yaml` matches the resources tree. 484 - `charm-publish` computes the local Hash of resources, and the Manifest for what is currently in the charm store to publish both the charm metadata and all resources in a single request 485 - publishing does not immediately move the ‘current’ pointer. This allows someone to explicitly deploy the version and test that the charm works with that version of resources 486 - supported architectures is an emergent property tracked by the charm store (known bad/unknown/known good) based on testing - hand wave 487 - charm store will be expected to de-dupe based on content hash (possibly a pair of different long hashes just to be sure) 488 - don’t let the manifest just be the SHA hash without a challenge 489 - either random set of bytes from the content 490 - salted hash - charm store gives the salt, publish charm computes the salted hash to confirm that they actually have the content 491 492 ### Spec 493 494 - be clear about what is in the charm store, what is defined by the charm in `resources.yaml`, and what deployed on disk 495 - use cases should show both the charm developer workflow, and user upgrade flow (which files get uploaded/downloaded etc) 496 - developing a new charm with resources 497 - with common resources 498 - with different resources for different architectures 499 - with some architectures needing specific files 500 - upgrade a charm by just modifying a few files 501 - upgrade a charm by only modifying charm hook 502 - upgrade both hook and resources 503 - adding new files 504 - docker workflow with base image and one overlay 505 - updating the overlay of a docker charm 506 - adding a new docker overlay will cause a rev bump on the charm as well as the resources because the resources.yaml file has to change to include the new layer 507 - illustrate explicitly the workflow if they forget to add the new layer to the resources.yaml file - publish fails because resources.yaml doesn’t match disk resources directory tree 508 509 ### Discussion 510 511 - Canarying will have to be across charm revision, blob set, and charm config. 512 - The charm version now includes charm revision and resources revision. 513 - Further discussion needed around health status for canaries later. 514 - Access control needs to be on top of the content addressing, just knowing the hash does not imply permission to fetch. 515 - Saving network transfer by doing binary diff on blobs of the same name with different hashes/versions would be nice for upgrades. 516 *sabdfl* says we have this behaviour already with the phone images, and we should break this out into some common library somewhere somehow. 517 518 ### Charms define their resources 519 520 - `resources.yaml` (next to `metadata.yaml`) 521 - Stream 522 - Has a description (free vs paid, beta etc/stable vs proposed) 523 - If you want to change the logic of a charm based on the blob stream, that is actually a different charm (or an if statement in your hooks) 524 - Streams will gain ACLs later (can you use the paid stream) 525 - Charms must declare a default stream 526 - Filenames 527 - name of the blob 528 - Version 529 - just a number (monotonically increasing number), version is stream dependent 530 - store has a pointer to the “current” version (which may not be the latest) 531 - Architecture 532 - Charm declares the shape of the resources it consumes (what files must be available). The store maintains the invariant that when the resource is updated, it contains the shape that the charm declared. 533 - `charm-publish` uploads both the version of the charm and the version of the resources 534 - we add a “current” pointer to charms like the resources, so that you have an opportunity to upload the charm and its resources and test it before it becomes the default charm that people get (instead of getting the ‘latest’ charm, you always get the ‘current’ unless explicitly specified) 535 - mysql-13.paid.62 536 537 ### Notes 538 539 We need to cache fat charms on the bootstrap node. We need to "auto Kapil" fat charms. Sometimes we don't even have access to the outside network. We need one hop from the unit to the bootstrap 540 541 However the important thing is that customers will probably fat charm everything, aka. huge IBM Websphere Java payload. 542 543 - Can Juju handle gigs of payload? Nate: Yes, moving away from the git storage. 544 - Is there anything core can do to make charms smaller? 545 - Marco: Yes 546 - Ben: We need a mechanism to specific common deps so that we can share them instead of having a copy in every charm. A bundle could have deps included, or maybe a common blob store? 547 - juju-deployer is moving to core. 548 549 If we move to image based workloads we can have a set image that included all the deps. 550 551 Nate: We could do it so if we’re on a certain cloud we can install the deps as part of the cloud: aka. if I am on softlayer make sure IBM java is installed via cloud-init. So we can do things like an optimized image for big data. 552 553 ### Work Items 554 555 1. Add optional format version to charm metadata (default 0) - 2 556 - Get juju to reject charms with formats it doesn’t know about ASAP 557 1. Charm store needs to grow blob storage, with multiple streams, current resource pointers and links to the charm itself for the resources - 4 558 1. Charm store needs to gain current charm revision pointers to charm - 2 559 - Juju should ask for current not latest 560 1. The charm store needs to know which revisions of each resource stream each charm revision works with - 2 561 1. Charm gains optional `resources.yaml` - 2 562 - Bump format version for those using `resources.yaml` 563 1. Need to write a proper charm publish - 12 564 - Resource manifest match 565 - Salted hashes 566 - Partial diff up/down not in rev 1 567 1. State server needs an HA charm/resources store - 8 568 - Should use same code as the actual charm store (shared lib/package) 569 - Replaces current charm storage in provider storage 570 1. Charm does not exist in state until we have copied all authorized resources into the local charm store. - 2 571 1. Uniter/charm.deployer needs to know about the resources file, parse content, know which stream, request resources from the local charm store, probably authenticated - 4 572 - Puts the resources into the resources directory as part of Deploy 573 1. Bind mounting ensuring the links for the files flatten in the resources dir - 2 574 575 ## Make writing providers easier 576 577 ### Problems 578 579 - Writing providers is hard. 580 - Writing providers takes a long time . 581 - Writing providers requires knowledge of internals of juju-core. 582 - Providers suffer bitrot quite quickly. 583 584 ### Agenda 585 586 - Can we externalize from core? (plugins/other languages?) 587 - Pre-made stub project with pluggable functions? 588 - How to keep in sync with core changes and avoid bitrot? 589 - How to insulate providers from changes to core? 590 - Can we simplify the interface? 591 - Complicating factor is config - can some be shared? 592 - Need to design for reuse - factor out common logic. 593 594 ### Notes 595 596 - Keep `EnvironProvider` 597 - Split `Environ` interface into smaller chunks 598 - E.g. `InstanceManagement`, `Firewall` 599 - Smaller structs with common logic, e.g. port management what use provider specific call outs 600 - Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct 601 - Above will allow necessary provider specific call outs to be identified 602 603 ### Work Items 604 605 1. Methods on provider operate on instance id 606 1. Introduce bulk API calls 607 1. Move instance addresses into environs/network 608 1. Split `Environ` interface into smaller chunks; introduce `InstanceManager`, `Firewaller` 609 1. Smaller structs with common logic, e.g. port management what use provider specific call outs 610 1. Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct 611 1. Stop using many security groups - use default group with iptables 612 1. Use `LoadBalancer`? interface (needed by Azure); will provide open/close ports; most providers will not need this and/or return no-ops 613 1. `Firewaller` worker be made the sole process responsible for opening/closing ports on individual nodes 614 1. Refactor provider’s use of `MachineConfig` as means to pass in params for cloud init; consider ssh’ing into pristine image to do work as per manual provider????? 615 616 ## Availability Zones 617 618 - Users want to be able to place units in availability zones explicitly (provider-specific placement directives). The core framework is nearing completion; providers need to implement provider-specific placement directives on top. 619 - Users want highly-available services (Juju infrastructure and charms). On some clouds (Azure), spreading across zones is critical; on others it is just highly desirable. 620 - Optional: one nice feature of the Azure Availability Set implementation is automatic IP load balancing (no need for HA Proxy which itself becomes a SPoF). Should we support this in other providers (AWS ELB, OpenStack LBaaS, ...)? 621 622 ### Agenda 623 624 - Prioritise implementation across providers (e.g. OpenStack > MaaS > EC2?). 625 - Discuss overall HA story, IP load balancing. 626 627 Azure supports implicit load balancing but don’t care about other clouds for now. 628 629 ### Work Items 630 631 1. Determine which providers support zones; EC2, OpenStack, Azure? 632 1. Implement distribution group in all providers; either they do it or return an error. 633 1. New policy in state which handles units on existing machines. 634 1. New method on state which accepts distribution groups and list of candidate instance ids and returns a list of equal best candidates. 635 1. Add API call to AMZ to find availability zones. 636 637 ## Networks 638 639 - Juju needs to be aware of existing cloud-specific networks, so it can make them available to the user (e.g. to specify placement and connectivity requirements for services and machines, provide network capabilities for charms/relations, fine-tuning relations connectivity, etc.). 640 - Juju needs to treat containers and machines in an uniform way with regards to networks and connectivity (e.g. providing and updating addresses for machines and containers, including when nesting). 641 - Knowing the network topology and infrastructure in the cloud, juju can have a better model how services/machines interact and can provide user-facing tools to manage that model (CLI/API, constraints/placement directives, charm metadata) in on a high level, so that the user doesn’t need to know or care how lower level networking is configured. 642 643 ### Agenda 644 645 - Discuss and outline the high-level architecture integrating existing MaaS VLAN MVP work and instances addresses, so that we have a unified networking/addressability model. 646 - Prioritize implementation across providers. 647 - Discuss and define required features and deadlines? 648 649 ### Meeting Notes 650 651 - We need networks per service -> then configure them on machines. 652 - Default networks get created (public/private)? 653 - Networks per relation -> routing between netdb (mysql) /netapp (wp) e.g. 654 - network relations to define routing ? add-net-relation netdb netapp; then when add-relation mysql wordpress [--using=netrel1] (if more than one) 655 - Container addressability 656 657 ## Networking - Connections vs Relations 658 659 Discussion of specifics of networking routing. 660 661 - Relations do not always imply connections (although usually they do, except when they don’t like with proxy charms). 662 - Juju wants to model the physical connections to open ports/iptables/securitygroups/firewalls appropriately to allow the relation’s actual traffic. 663 - We need to be able to specify the endpoints for communication within charm hooks if it’s not the default model. Possible hook commands for that: 664 - `enable-traffic endpoint_ip_address port_range` 665 - For example: `enable-traffic 194.123.45.6 1770-2000` 666 - `disable-traffic ep port_range` 667 - Also talk to OpenStack charmers about non relation TCP traffic. 668 - Should Juju model routing rules and tables for networks? (Directly via API/CLI or implicitly as part of other commands, like add-relation between services on different networks). 669 670 ## Deployer into Juju Core 671 672 - To embed the GUI we need a solid path for making bundles work. 673 - You can’t juju deploy a bundle. 674 - Moving towards stacks Core should support bundles like charms, provide apis to the files inside, etc. 675 - Can GUI use the ECS to replace the functionality of the deployer for GUI needs? 676 677 The goal if the meeting is to verify that this is a logical path forward and create a plan to migrate to this. Stakeholders should agree on the needs in Core and make sure that it works with vs against future plans to expand on the idea of bundles into fat bundles and stacks. 678 679 ## Bundles to Stacks 680 681 What’s needed to turn bundles into stacks? 682 683 Bundles have no identity at run time, we want this for stacks. A namespace to identify the group of services that are under a bundle. 684 685 Drag a bundle to the GUI, you get a bunch of services, with stacks, drag and drop a stack and you get one identifiable stack icon that itself is a composable entity and logical unit. 686 687 - Namespaces 688 - The collection of deployed entities belong to a stack 689 - Bundles today ‘disappear’ once deployed (the services are available, but there is no visible difference from just doing the steps manually) 690 - Exposed endpoints 691 - Interface “http” on the stack is actually “http” on internal Wordpress 692 - Hierarchy (nesting) 693 - Default “status” output shows the collapsed stack, explicitly describing the stack shows the internal details 694 695 ### GUI concerns/thoughts 696 697 - Expanded stack takes over the canvas, other items not shown 698 - Drag on an “empty stack” which you can explode to edit, adding new services inside 699 700 ### Notes 701 702 - GUI can’t support bundles with local charms 703 - Bundles should become core entity supported by juju-core 704 - Deployer into juju-core should come after work for supporting uncommitted changes 705 - (dry run option?) 706 707 ### Stacks 2.0 708 709 Further items about what a stack becomes 710 711 - Incorporating Actions 712 - Describing behavior for Add-Unit of a stack 713 714 ### Work Items 715 716 Spend time to make a concrete Spec for next steps 717 for “namespacing” an initial implementation could just tag each item that is deployed with a name/UUID 718 719 ## Charm Store 2.0 720 721 - Access Control 722 - Replacing Charm World 723 - Ingesting Charms (for example w/ GitHub) 724 - Ingesting Bundles 725 - Search 726 727 Kapil’s aim: simplify current model of charm handling. Break three way link between launchpad, charmworld (deals with bundles, used via api by the gui), and the charmstore (deals in charms, used by juju-core state server). Question: is breaking the link between launchpad and charmworld the first step? 728 729 Lots of discussion over first steps, migrate charmworld api into store? Does the state server also need to implement it? Currently api is small but specific, search, pull specific file (maybe with some magic for icons) out of charms, some other things. 730 731 **First step**: Add feed from store that advertises charms. Change charmworld ingest to read from the store feed rather than launchpad directly. 732 733 **Second step**: Bundles are only in charmworld currently. Pulled from launchpad, are a branch with bundles.yaml, a readme, similar to a charm. Store needs to ingest bundles as well and also publish as a separate bundle feed. Change charmworld ingest to read store bundle feed. 734 735 **Third step**: Add v4 api that supercedes current charmworld v3 api, implemented in store. Cleaning up direct file access and other odd things at the same time. Remember that charm-tools are currently a consumer of v3 api. 736 737 We may want to split charm store out of juju-core codebase, along with packages such as charm in core to separate libraries. 738 739 After charmworld no longer talks to Launchpad it will be easier to provide ingestion from other sources, e.g. GitHub. Publishing directly to the store will be possible also. 740 741 Work item - bac - document existing charmworld API 3 (see [Charmworld API 3 Docs](http://charmworld.readthedocs.org/en/latest/api.html)) 742 743 We’ll need to be able to serve individual files out of charms: 744 745 - `metadata.yaml` 746 - `icon.svg` 747 - `README` 748 749 Search capability could be provided by Mongo 2.6 fulltext search? 750 751 ### Questions 752 753 - How does ingestion of charm store charms for personal names space? 754 - `juju deploy cs:gh` 755 - Charm store 2.0 should be able to ingest not only from GitHub but from a specific branch in a GitHub repo (e.g. https://GitHub.com/charms/haproxy/tree/precise && https://GitHub.com/charms/haproxy/tree/trusty or a better example https://GitHub.com/charms/haproxy/tree/centos7) This is needed when there needs to be two different versions of a charm. 756 - As a best practice charms should endevour to have one charm per OS. When the divergence for a given charm is great enough (e.g. Ubuntu to CentOS) we should look at creating a new branch in git. 757 758 ## ACLs for Charms and Blobs 759 760 ### Work Items 761 762 1. Namespace that holds revisions for a charm needs to store ACLs. 763 1. Charm store needs to check them against API requests. 764 1. The API to get the resource need to have a reference to the top-level charm. TBD. So we can check the read permission. 765 766 Need to decide how we want to deal with access to metadata and content. 767 Should we always allow full access to all blobs and content if you can deploy, 768 769 ### Option #1 770 771 r=metadata 772 w=publish 773 x=deploy 774 775 #### Public charm (0755) 776 777 | | | Metadata | Publish | Deploy | 778 |------------|----------|----------|---------|--------| 779 | maintainer | charmers | X | X | X | 780 | installers | charmers | X | | X | 781 | everybody | - | X | | X | 782 783 #### Charm under test (0750) 784 785 | | | Metadata | Publish | Deploy | 786 |------------|----------|----------|---------|--------| 787 | maintainer | cmars | X | X | X | 788 | installers | qa | X | | X | 789 | everybody | - | | | | 790 791 #### Gated charm (0754) 792 793 You can see it, but you have to get approval (added to installers). 794 795 | | | Metadata | Publish | Deploy | 796 |------------|----------|----------|---------|--------| 797 | maintainer | ibm | X | X | X | 798 | installers | ibm-customers | X | | X | 799 | everybody | - | X | | | 800 801 ### Option #2 802 803 r=read content of charm 804 w=publish 805 x=deploy and read metadata 806 807 #### Public charm (0755) 808 809 | | | Content | Publish | Metadata & Deploy | 810 |------------|----------|----------|---------|--------| 811 | maintainer | charmers | X | X | X | 812 | installers | charmers | X | | X | 813 | everybody | - | X | | X | 814 815 #### Charm under test (0750) 816 817 | | | Content | Publish | Metadata & Deploy | 818 |------------|----------|----------|---------|--------| 819 | maintainer | cmars | X | X | X | 820 | installers | qa | X | | X | 821 | everybody | - | | | | 822 823 #### Gated charm (0710) 824 825 You can see it, but you have to get approval (added to installers). 826 827 | | | Content | Publish | Metadata & Deploy | 828 |------------|----------|----------|---------|--------| 829 | maintainer | ibm | X | X | X | 830 | installers | ibm-customers | | | X | 831 | everybody | - | | | | 832 833 #### Commercial charm with installer-inaccessable content (0711) 834 835 | | | Content | Publish | Metadata & Deploy | 836 |------------|----------|----------|---------|--------| 837 | maintainer | ibm | X | X | X | 838 | installers | ibm-customers | | | X | 839 | everybody | - | | | X | 840 841 ## Upgrades 842 843 Prior to 1.18, Juju did not really support upgrades. Each agent process listened to the agent-version global config value and restarted itself with a later version of its binary if required. 844 845 1.18 introduced the concept of upgrade steps, which allowed for ordered execution of business logic to perform changes associated with upgrading from X to Y to Z. 1.18 also made the machine agents on each node solely responsible for initiating an upgrade on that node, rather than all agents (machine, unit) acting independently. However, several pieces are still missing…. 846 847 ### Agenda items 848 849 - Coordination of node upgrades - lockstep upgrades 850 - Schema updates to database 851 - HA - What needs to be done to support upgrades in a HA environment? 852 - Read only mode to prevent model or other changes during upgrades 853 - How to validate an upgrade prior to committing to it, e.g. bring up shadow Juju environment on upgraded model and validate first before either committing or switching back? 854 - Perhaps a `--dry-run` to show what would be done? 855 - Authentication/authorization - restrict upgrades to privileged users? 856 - How to deal with failed upgrades / rollbacks? Do we need application level transactions? 857 - Testing of upgrades using dev release - faking reported version to allow upgrade steps to be run etc 858 859 ### Work items for schema upgrade 860 861 Key assumption - database upgrades complete quickly 862 863 1. Implement schema upgrade code (probably as an upgrade step). 864 - mgo supports loading documents into maps, so we do not have to maintain legacy structs. 865 - Record “schema” version. 866 1. Implement state/mongo locking, with explicit upgrading/locked error. 867 - One form of locking is to just not allow external API connections until upgrade steps have completed, since we know we just restarted and dropped all connections. 868 1. Introduce retry attempts in API server around state calls. 869 1. Take copy of db prior to schema upgrade and copy back if it fails. 870 1. Upgrade steps for master state server only. 871 1. Coordination between master/slave state servers to allow master to finish first. 872 873 ### Work items for upgrade story 874 875 - Allow users to find out what version it will pick when upgrading. 876 - Commands to report that upgrade is in progress if run during an upgrade. 877 - Peer group worker to only start after an upgrade has completed. 878 - Update machine status during upgrade, set error status on failure. 879 880 ## Juju Integration with Oasis TOSCA standards (IBM) 881 882 [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) is a standard aimed at, “Enhancing the portability and management of cloud applications and services across their lifecycle.” In discussions with IBM we need to integrate Juju into TOSCA standards as part of our agreement. Thus we need to define the following: 883 884 - [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) - simple profile yaml doc, updated approx weekly 885 - Discuss who will lead this effort and engage with IBM. 886 - Define the correct integration points. 887 - Define design and architecture of TOSCA integration. 888 - Define what squad will deliver the work and timelines. 889 890 ### Goal 891 892 - Drag a TOSCA spec onto the juju-gui and have the deployment happen. 893 894 ## Other OS Workloads 895 896 Juju has been Ubuntu only so far but never intended to be only Ubuntu. We were waiting for user demand. It seems some of that demand has now happened. From earlier discussions the following areas have been identified for work: 897 898 1. Remove assumptions about the presence of apt from core 899 1. Deal with upstart vs systemv vs windows services init system differences for agents 900 1. Deal with rsyslog configuration 901 1. Define initial charms (bare minimum would be ubuntu charm equivalents) 902 1. Update cloud-init stuff for alternate OS 903 1. SSH Configuration 904 1. Define and handle non Ubuntu images 905 906 Key questions are: 907 908 1. Which is going to be first 909 - expect the windows workloads as that has been implemented already and we just need to integrate 910 1. How important is this compared to the other priorities? 911 912 I don’t think there are any questions around “should we do it”, just “when should we do it”. 913 914 ### CentOS / SLES 915 916 Hopefully we can handle both CentOS and SLES at one go as they are based on very similar systems. We may need to abstract out some parts, but on the whole, they *should* be very similar. Again there should be a lot of overlap between Ubuntu and both CoreOS and SLES, with obvious differences in agent startup management and software installation. The writing of the actual charms are outside the scope of this work, although we should probably make CentOS and SLES charms to mirror the ubuntu charm that just bring up an appropriate machine. 917 918 ### Windows 919 920 We have work that has been done already by a third party to get Juju working to deploy windows workloads. It is expected that this work that is done will not either cleanly merge with current trunk, nor necessarily meet our normal demands of tests, robustness or code quality. We won’t really know until we see the code. However what it does give us is something that works that clearly identifies all of the Ubuntu specific parts of the codebase, and will give us a good foundation to work from to get the workload platform agnostic nature we desire. 921 922 ### Notes 923 924 - Need to get code drop from MS guys. 925 - Use above to identify non Ubuntu specific parts of code. 926 - We do interface design, CentOS implementation. 927 - We hand the above back to MS guys and they use that as template to re-do the Windows version. 928 - Excludes state server running on Windows. 929 - Manual provisioning Windows instances. 930 - Local provider (virtual box) on Windows. 931 932 ## 3rd Party Provider Implementations 933 934 - Improving our documentation around what it takes to implement a Provider. 935 - We still call them Environ internally. 936 937 ## Container Addressability (Network Worker) 938 939 - [Earlier notes on Networking](https://docs.google.com/a/canonical.com/document/d/1Gu422BMAJDohIXqm6Vq4WTrtBV8hoFTTdXvXDQCs0Gs/edit) 940 - Link to [Juju Networking Part 1](https://docs.google.com/a/canonical.com/document/d/1UzJosV7M3hjRaro3ot7iPXFF9jGe2Rym4lJkeO90-Uo/edit#heading=h.a92u8jdqcrto) early notes 941 -What are the concrete steps towards getting containers addressable on clouds? 942 - Common 943 - Allocate an IP address for the container (provider specific). 944 - Change the NI that is being used to be bridged. 945 - Bring up the container on that bridged network and assign the local address. 946 - EC2 947 - **ACTION(spike)**: How do we get IP addresses allocated in VPC? 948 - Anything left to be done in goamz? 949 - OpenStack 950 - Neutron support in lp:goose. 951 - Add neutron package. 952 - Sane fallback when endpoints are not available in keystone (detect if Neutron endpoints are supported or not and if not report the error). 953 - New mock implementation (testservers). 954 - Specify ports/subnets at StartInstance time (possibly a spike as well). 955 - Add/remove subnets. 956 - Add/remove/associate ports (Neutron concept, similar to a NIC). 957 - Add/remove/relate bridges? Probably not needed for now. 958 - Maybe security groups via Neutron rather than Nova. 959 - Potential custom setup once port is attached on machine 960 961 We need a Networker worker at the machine level to manage networks. What about public addresses? We want `juju expose` to grow some ability to manage public addresses. Need to be aware that there’s a limit of 5 elastic IPs per region per account. Can instead get a public address assigned on machine startup that cannot be freely reassociated. Need to make a choice about default VPC vs creating a VPC. Using only default VPC is simpler. 962 963 ### Potentially out of scope for now 964 965 - Using non-default VPC - requires several additional setup steps for routes and such like. 966 - Networking on providers other than EC2/OpenStack, beyond making sure we don’t bork on interesting setups like Azure. 967 - Networking on cloud deployments that do not support Neutron (e.g. HP). 968 969 Separate discussion: Update ports model to include ranges and similar. 970 971 Switching to new networking model also enables much more restrictive firewalling, but does require some charm changes. If charms start declaring ports exposed on a private networks, it would be possible to skip address-per-machine for non-clashing ports. Also allows more restrictive internal network rules. 972 973 ### Rough Work Items 974 975 1. When adding a container to an existing machine, Environment Provisioner requests a new IP address for the machine, and records that address as belonging to the container. 976 1. `InstancePoller` needs to be updated, so that when it lists the addresses available for a machine, it is able to preserve the allocation of some addresses to the hosted containers. 977 1. `Networker` worker needs to be able to set up bridging on the primary instance network interface, and do the necessary ebtables/iptables rules to use the same bridge for LXC containers (e.g. any container can use one of the host instance’s allocated secondary IP addresses so it appears like another instance on the same subnet). 978 1. Existing MaaS cloudinit setup for VLANs will be moved inside the networker worker. 979 1. Networker watches machine network interfaces and brings them up/down as needed (e.g. doing dynamically what MaaS VLAN cloudinit scripts do now and more). 980 981 ## Leader Elections 982 983 Some charms need to elect a “master” unit that coordinates activity on the service. Also, Actions will at times need to be run only on the master unit of a service. 984 985 - How do we choose a leader? 986 - How do we read/write who the leader is? 987 - How do we recover if a leader fails? 988 - The current leader can relinquish leadership (e.g. this is a round robin use case). 989 990 Lease on leader status. Allows caching, prevents isolated leader from performing bad actions. If leader is running an action and can’t renew lease, must kill action. Same with hooks that require leader. Agent controls leader status, does the killing. 991 992 ## Improving charm developer experience 993 994 Charms are the most important part of Juju. Without charms people want to use, Juju is useless. We need to make it as easy as possible for developers outside Canonical to write charms. 995 996 Areas for improvement: 997 998 - Make charm writing easier. 999 - Make testing easier. 1000 - Make charm submission painless. 1001 - Make charm maintenance easier. 1002 - What are the current biggest pain points? 1003 1004 ## Juju needs a distributed log file 1005 1006 We currently are working on replicating rsyslog to all state servers when in HA. Per Mark Ramm, this is good enough for now. We may want to discuss a real distributed logging framework to help with observability, maintenance, etc. 1007 1008 ### Notes 1009 1010 - Kapil says Logstash or Heka. Heka is bigger and more complicated, and suggests Logstash is more likely to be suitable. 1011 - Wayne has used Apache Scribe in the past. 1012 - Requirements: 1013 - Replicated (consistently) across all state servers. 1014 - Newly added state servers must have old log messages available. 1015 - Must be tolerant of state server failures. 1016 - Store and forward. 1017 - Nice to have: efficient querying. 1018 - Nice to have: surrounding tooling for visualization, post-hoc analysis, … 1019 - Encrypted log traffic. 1020 1021 ### Actions 1022 1023 Juju actions are charm-defined functionality that is user-initiated and take parameters and are executed on units. Such as backing up mysql. 1024 1025 ### Open Questions 1026 1027 - How do we handle history and results? 1028 - How do we handle actions that require leaders on services with no leaders? 1029 - Is there anything else controversial in the spec? 1030 - Do we have a piece of configuration on the action defining what states it's valid to run it in? 1031 - Users should be made away of the lifecycle of an action. For example, what unit is currently backing up, the progress of the backup and the resolution of the backup if it was successful or not. 1032 1033 Actions have 1034 1035 1. State 1036 1. Lifecycle 1037 1. Reporting 1038 1039 Actions accept parameters. 1040 Actions directory at the top level: Contents a bunch of named executables. 1041 `actions.yaml` has a key for each action. 1042 E.g. service or unit action, e.g. schema for the parameters. (JSON schema expressed in YAML). 1043 1044 There are both unit-level and service-level actions. Unit-level will be done first. 1045 1046 Collections of requests and results. 1047 Each unit watches the actions collection for actions targeted to itself. 1048 Not notified of things they don't care about. 1049 When you create an action, you get a token, you watch for the token in the results table. 1050 Non-zero means failure. Error return from an action doesn't put the unit into an error state. 1051 1052 Actions need to work in more places than hooks. We don't want to run them before start or after stop. We want to run them while in an error state. 1053 1054 ``` 1055 $ juju do action-name [unit-or-service-name] --config path/to/yaml.yml 1056 ``` 1057 1058 By specifying a service name for a unit action, run against all units by default. 1059 1060 Results are yaml. 1061 1062 stdout -> log 1063 1064 Hook and action queues are distinct 1065 1066 ### Work Items 1067 1068 1. Charm changes: 1069 - Actions directory (like hooks, named executables). 1070 - Top-level actions.yaml (top-level key is actions, sub-keys include parameters, description). 1071 1. State / API server: 1072 - Add action request collection. 1073 - Add action result collection. 1074 - APIs for putting to action/result collections. 1075 - APIs for watching what request are relevant for a given unit. 1076 - APIs for watching results coming in (probably filtered by what unit/units we're interested in). 1077 - APIs for listing and getting individual results by token. 1078 - APIs for getting the next queued action. 1079 1. Unit agent work: 1080 - Unit agent's "filter" must be extended to watch for relevant actions and deliver them to the uniter. 1081 - Various modes of the uniter need to watch that channel and invoke the actions. 1082 - Handwavy work around the hook context to make it capable of running actions and persisting results. 1083 - Hook tools: 1084 - Extract parameters from request. 1085 - Dump results back to database. 1086 - Error reporting. 1087 - Determine unit state? 1088 1. CLI work: 1089 - CLI needs a way to watch for results. 1090 - juju do sync mode 1091 - juju do async mode 1092 - juju run becomes trivially implementable as an action 1093 1. API for listing action history. 1094 1. Leader should be able to run actions on its peers (use case: rolling upgrades). 1095 1. Later: Fix up the schema for charm config to match actions. 1096 1097 ## Actions, Triggers and Status 1098 1099 What are triggers? (related to Actions, IIRC) 1100 1101 ### Potential applications 1102 1103 - Less polling for UI, deployer, etc. 1104 1105 ### Topics to discuss 1106 1107 - Authentication 1108 - Filtering & other features 1109 - API 1110 - Implementation 1111 1112 ## Combine Unit agent and Machine agent into a single process 1113 1114 - What is the expected benefit? 1115 - Less moving parts, machine and unit agents upgrade at the same time. 1116 - Avoids N unit agents for N charms + subordinates (when hulk-smashing for example). 1117 - Less deployment size footprint (one less jujud binary). 1118 - Less workers to run, less API connections. 1119 - What is the expected cost? 1120 - rsyslog tagging (logs from the UA arrive with the agent’s tag; we need to keep that for observability). 1121 - Concrete steps to make the changes. 1122 1123 Issues with image based deployments? 1124 1125 - No issues expected. 1126 - Even we need a juju component inside the container, no issue. 1127 1128 ### Work Items 1129 1130 1. Move relevant unit agent jobs into machine agent (drop duplicates). 1131 1. Remove redundant upgrade code. 1132 1. Change deployer to start new uniter worker inside single agent. 1133 1. Change logging (loggo/rsyslog worker) to allow tags to be specified when logging so that each unit still logs with its own tag. 1134 1. (Eventually) consolidate previously separate unit/machine agent directories into single dir. 1135 1. ensure juju-run works as before 1136 1137 ## Backup/Restore 1138 1139 - Making current state work: 1140 - We need to have the mongo client for restore. 1141 - We need to ignore replicaset. 1142 - What will it take to implement a “proper” backup, instead of just having some scripts that mostly seemed to work one time. 1143 - Back-up is an API call 1144 - Restore should grow in `jujud`. 1145 - Add a restore to the level of bootstrap? 1146 - Turning our existing juju-backup plugin from being a plugin into being integrated core functionality. 1147 - Can we snapshot the database without stopping it? 1148 - How will this interact with HA? We should be able to ask a secondary to save the data. 1149 - It is possible to mongodump a running process, did we consider that rather than shutting mongo down each time? 1150 - Since we now always use --replicaSet even when we have only 1, what if we just always created a “for-backup” replica that exists on machine-0. Potentially brought up on demand, brought up to date, and then used for backup sync. 1151 - juju-restore 1152 - What are the assumptions we can reliably make about the system under restore? 1153 - E.g., in theory we can assume all members of the replica are dead, otherwise you wouldn’t be using restore, you would just be calling enusre-availability again. 1154 - Can we spec out what could be done if the backup is “old” relative to the current environment? Likely most of this is “restore 3.0” but we could at least consider how to get agents to register their information with a new master. 1155 1156 ### Concrete Work Items 1157 1158 1. Backup as a new Facade for client operations. 1159 1. `Backup.Backup` as an API call which does the backup and stages the backup content on server disk. API returns a URL that can be used to fetch the actual content. 1160 1. `Backup.ListBackups` to get the list of tarballs on disk. 1161 1. `Backup.DeleteBackups` to clean out a list of tarballs. 1162 1. HTTP Mux for fetching backup content. 1163 1. Juju CLI for 1164 - `juju backup` (request a backup, fetch the backup locally) 1165 1166 ## Consumer relation hooks run before provider relation hooks 1167 1168 [Bug 1300187](https://bugs.launchpad.net/juju-core/+bug/1300187) 1169 1170 - IIRC, William had a patch which made the code prefer to run the provider side of hooks first, but did not actually enforce it strictly. Does that help, or are charms still going to need to do all the same work. 1171 - Does it at least raise the frequency with which charms “Just Work” or does it make it hard to diagnose when they “Just Fail”. 1172 1173 ## Using Cloud Metadata to describe Instance Types 1174 1175 We currently hard-code EC2 instance types in big maps inside of juju-core. When EC2 changes prices, or introduces a new type, we have to recompile juju-core to support it. Instead, we should be able to read the information from some other source (such as published on streams.canonical.com since AMZ doesn’t seem to publish easily consumable data). 1176 1177 - OpenStack provider already reads the data out of keystone, are we sure AMZ doesn’t provide this somewhere. 1178 - Define a URL that we could read, and a process for keeping it updated. 1179 1180 ### Work Items 1181 1182 1. Investigate the instance type information each cloud type has available - both programmatically and elsewhere. 1183 1. Define abstraction for retrieving this information. Some clouds will offer this information directly, others will need to get this from simplestreams. Some cloud types may involve getting the information from mixed sources. 1184 1. Support search path for locating instance information and mixed sources. 1185 1. Ensure process for updating Canonical hosted information is in place. 1186 1. Document how to update instance type information for all cloud types. 1187 1. API for listing instance types (for GUI). 1188 1189 ## API Versioning 1190 1191 We’ve wanted to add this for a long time. 1192 1193 - Possible [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit#heading=h.avfqvqaaprn0) for refactoring API into many Facades 1194 - [14.04 Spec](https://docs.google.com/a/canonical.com/document/d/12SFO23hkx4sTD8he61Y47_kBJ3H5bF2KOwrFFU_Os9M/edit) 1195 - Can we do it and remain 2.x compatible for the lifetime of Trusty? 1196 - Concrete design around what it will look like. 1197 - From an APIServer perspective (how do we expose multiple versions). 1198 - From an API Client perspective. 1199 - From the Juju code itself (how does it notice it wants version X but can only get Y so it needs to go into compatibility mode, is this fine grained on a single API call, or is this coarse grained around the whole API, or middle ground of a Facade). 1200 1201 ### Discussion 1202 1203 - We can use the string we pass in now ("") to each Facade, and start passing in a version number. 1204 - Login can return the list of known Facades and what version ranges are supported for each Facade. 1205 - Login could also start returning the environment UUID that you are currently connected to. 1206 - With that information, each client-side Facade tracks the best version it can use, which it then passes into all `Call()` methods. 1207 - Compatibility code uses `Facade.CurrentVersion()` to do an if/then/switch based on active version and do whatever compatibility code is necessary. 1208 1209 ### Alternatives 1210 1211 - Login doesn’t return the versions, but instead when you do a `Call(Facade, VX)` it can return an error that indicates what actual versions are available. 1212 - Avoids changing Login. 1213 - Adds a round-trip whenever you are actually in compatibility mode. 1214 - Creates clumsy code around: `if Facade.Version < X { do compat} else { err : =tryLatest; if err == IsTooOld {compat}}` 1215 - Login sets a global version for all facades. 1216 - Seems a bit to coarse grained that any change to any api requires a global version bump (version number churn). 1217 - Each actual API is individually versioned. 1218 - Seems to fine grained, and makes it difficult to figure out what version needs to be passed when (and then deciding when you need to go into compat mode). 1219 1220 ## Tech-debt around creating new api clients from Facades 1221 1222 [Bug 1300637](https://bugs.launchpad.net/juju-core/+bug/1300637) 1223 1224 - Server side [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit). 1225 - We talked about wanting to split up Client into multiple Facades. How do we get there, what does the client-side code look like 1226 - We originally had just `NewAPIClientFromName`, and Client was a giant Facade with all functions available 1227 - We tried to break up the one-big-facade into a few smaller ones that would let us cluster functionality and make it clearer what things belonged together. (`NewKeyManagerClient`). 1228 - There was pushback on the proliferation of lots of New*Client functions. One option is that everything starts from `NewAPIClientFromName()`, which then gets a `NewKeyManager(apiclient)`. 1229 1230 ## Cross Environment Relations 1231 1232 We’ve talked a few times about the desirability of being able to reason about a service that is “over there”, managed in some other environment. 1233 1234 - Last [spec](https://docs.google.com/a/canonical.com/document/d/1PpaYWvVwdF55-pvamGwGP23_vHrmFwCW8Bi-4VUg-u4/edit) 1235 - Describes the use cases, confirm that they are still valid. 1236 - We should update to include the actual user-level commands that would be executed and what artifacts we would expect (e.g., `juju expose-service-relation` creates a `.jenv/.dat/.???` that can be used with `juju add-relation --from XXX.dat`). 1237 1238 ### Notes 1239 1240 Expose endpoint in env 1, this generates a jenv (authentication info for env1) that you can import-endpoint into another environment. This has env2 connects to env1, asks for information about the service in env1. This creates a ghost service in env2 that exposes a single endpoint, which is only available for connecting relations (no config editing etc). There is a continuous connection between the two environments to watch whether the service goes down, etc. 1241 Propagate IP changes to other environment. Note that it is currently broken for relations even in a single environment. 1242 Cross environment relations always use public addresses (at least to start). 1243 Note that the ghost service name may be the same as an existing service name, and we have to ensure that’s ok. 1244 1245 ## Identity & Role-Based Access Controls 1246 1247 - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm) 1248 - [Establishing User Identity](https://docs.google.com/a/canonical.com/document/d/150GEG_mDnWf6QTMc1kBvw_x_Y_whGVN19mr3Ocv6ELg/edit#heading=h.aza0s6fmxfs9) 1249 1250 ### Current Status 1251 1252 - Concept of service ownership in core. 1253 - Add/remove user, add-environment framework done, not exposed in CLI. 1254 1255 What does a minimum viable multi-user Juju look like? (Just in terms of ownership, not ACLs). 1256 1257 - `add-user` 1258 - `remove-user` 1259 - `add-environment` 1260 - `whoami` 1261 1262 ### 14.07 (3mo) 1263 1264 - Beginnings of role-based access controls on users (Implementation of RBAC in core is another topic). 1265 - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm). 1266 - Non-superusers: read-only access at a minimum. 1267 1268 ### 14.10 (6mo) 1269 1270 - Command-line & GUI identity provider integrations. 1271 1272 ### 15.01 (9mo) 1273 1274 - IaaS, mutually-trusted identities across enterprises. 1275 - Need a way to securely broker B2B IaaS-like transactions. 1276 1277 ## Iron Clad Test Suite 1278 1279 The Juju unit test suite is beset by intermittent failures, caused by a number of issues: 1280 1281 - Mongo and/or replica set related races. 1282 - Access to external URLs e.g. charm store. 1283 - Isolation issues such that one failure cascades to cause other tests to fail. 1284 1285 There are also other systemic implementation issues which cause fragility, code duplication, and maintainability problems: 1286 1287 - Lack of fixtures to set up tools and metadata (possibly charms?). 1288 - Code duplication due to lack of fixtures. 1289 - Issues with defining tools/version series such that tests and/or Juju itself can fail when run on Ubuntu with different series. 1290 1291 Related but not a reliability issue is the speed at which the tests run e.g. the Joyent tests take up to 10 minutes. We also have tests which were set up to run against live cloud deployments but which in practice are never run - we now rely on CI. 1292 1293 Over the last cycle, things have improved, and there are certain issues external to Juju (like mongo) which contribute to the problems. But we are not there yet and must absolutely get to the stage where tests pass first time, every time on the bot and when run locally. We need to consider/discuss/agree on: 1294 1295 - Identify current failure modes. 1296 - Harden test suite to deal with external failures, fix juju-core issues. 1297 - Introduce fixtures for things like tools and metadata setup and refactor duplicate code and set up. 1298 - Document fixtures and other test best practices. 1299 1300 ### Work Items - Core - Refactoring and Hardening 1301 1302 Juju does what it is supposed to do, but has a number of rough edges when it comes to various non-functional requirements which contribute to the fact that often Juju doesn’t Just Work, and many times requires an unacceptably high level of user expertise to get things right. These non-functional issues can very broadly be classified as: 1303 1304 - **Robustness** - Juju needs to get better at dealing with underlying issues, whether transient network related, provider/cloud related, or user input. 1305 - **Observability** - Juju needs to be less of a black box, and expose more of what’s going on under the covers, so that both humans and machine alike can make informed decisions in response to errors and system status. 1306 - **Usability** - Juju needs to provide a UI and workflow that makes it difficult to make mistakes in the first place; to catch and report errors early as close to the source as possible. 1307 1308 As well as changes to the code itself, we should consider process changes which will guide how new features are implemented and rolled out. There is currently a disconnect between developers and users (real world). A developer will often test a new feature in isolation on a single cloud which works first time, deployed on an environment with a few. 1309 1310 - Rename `LoggingSuite` to something else, make the default base suite with mocked out `$HOME`, etc. 1311 - Identify independent fixtures (e.g. fake home, fake networking, …), and compose base suite from them. 1312 - Create fake networking fixture that replaces the default HTTP client with something that rejects attempts to connect to non-localhost addresses. 1313 - Update tools fixture and related tests. 1314 - Introduce in-memory mock mgo for testing independent of real mongo server. 1315 - Continue separation of api/apiserver in unit tests to enable better error checking. 1316 - Document current testing practices to avoid cargo culting of old practices, ensure document is kept up-to-date at code review time. 1317 - Update, speed-up Joyent tests (and all tests in general). Joyent tests currently take ~10mins, which far too long. 1318 - Suppress detailed simplestreams logging by default in (new) ToolsSuite by setting streams package logging level to INFO suite setup. 1319 - Delete live tests from juju-core. 1320 1321 Nodes at best. They won’t be exposed to the pain associated with / needed to diagnose and rectify faults etc since it’s often easier to destroy-environment and start again, or a new revision will have landed and CI will start all over again. More often than not, it’s the QA team who has to diagnose CI failures which are raised as bugs but with developers being spared the pain of the root cause analysis and any fixes often addressing a specific bug rather than a systemic, underlying issue. 1322 1323 ### Items to consider 1324 1325 - Architectural layers - what class of error should each layer handle and how should errors be propagated / handled upwards. 1326 - How to expose/wrap provider specific knowledge to core infrastructure so that such knowledge can be used to advantage? 1327 - Where’s the line between Juju responding to issues encountered vs informing and immediate feedback of problems but CI issues lack immediate visibility. 1328 - Close the loop between real world deployment and developers. 1329 - How to ensure teams take ownership of non-functional issues? 1330 - Tooling - targeted inspection of errors, decisions made by Juju, e.g. utilities exist to print where tools/image metadata comes from; is that sufficient, what else is needed? 1331 - Roadmap would be awesome to know what features to look for in upcoming releases (and waiting for user input. 1332 - Feature development - involve stakeholders/users (CTS?) more, at prototype stage and during functional testing? 1333 - Hhow best to expose developers to real world, so that necessary hardening work becomes as much of an itch scratch as it does a development chore. 1334 -C close the loop between CI and development - unit tests / landing bot provide flag specific features for additional functional testing). 1335 1336 ### Notes 1337 1338 - Mock workflow in a spec/doc, quick few paragraphs about a change or feature will look for a user facing standpoint. 1339 - Not all features require functional / UAT because of time constraints but still want to give CTS etc input to dev. 1340 - Wishlist: Send more developers out on customer sites to get real world experiences. 1341 - Much more involvement with IS as a customer. 1342 - More core devs need to write charms. 1343 - Debug log too spammy - but new incl/excl filters may help. 1344 - Debug hooks used a lot - considered powerful tool. 1345 - Debug hooks should be able to drop a user into a hook context when not in error state, e.g. `juju debug hooks unit/0 config-changed`. 1346 - Need more output in status to expose internals (Is my environment idle or busy?). 1347 - More immediate reporting to user of charm output as deploy happens, don’t want to wait 15 minutes to see final status. 1348 - Juju diagnose - post mortem tools <- already done via juju ready/unready, output vars etc 1349 1350 ### Work Items 1351 1352 [Juju Fixes](https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoQnpJ43nBkJdHhnV05NcmQ3Tm5yRnIwcTlYMTZEaEE&usp=sharing) 1353 1354 1. Design error propagation mechanism to be used across providers. 1355 1. Destroy Service --Force. 1356 1. Dry run tell user what version upgrade-juju will use. 1357 1. Inspect Relation Data. 1358 1. Address changes must propagate to relations. 1359 1. Use Security Group Per Service. 1360 1. Use Instance names/tags for machines. 1361 1. safe-provisioning-mode default. 1362 1. Bulk machine creation. 1363 1. Unit Ids must be unique. 1364 1365 ## Retry on API Failures 1366 1367 Really part of hardening. There are transient provider failures due to issues like exceeding allowable API invocation rate limits. Currently Juju will fail when such errors are encountered and consider the errors permanent, when it could retry and be successful next time. The OpenStack provider does this to a limited extent. A large part of the problem is that Juju is chatty and makes many individual API calls to the cloud. We currently have a facility to allow provisioning to be manually retried but need something more universal and automated. 1368 1369 ### Discussion Points 1370 1371 - Understanding what types of operation can produce transient errors. is it the same for all providers? what extra information is available to help with retry decision? 1372 - Common error class to encapsulate transient errors. 1373 - Algorithm to back off and retry. 1374 - To what extent can Juju design / implementation change to mitigate most common cause which is exceeding rate limits. 1375 - How to report / display retry status. 1376 - Manual intervention still required? 1377 1378 ### Work Items 1379 1380 1. Identify for for each provider which errors can be retried. 1381 1. Juju should handle retries. 1382 1. Above discussion points constitute the other work items. 1383 1. Audit juju to identify api optimisation opportunities. 1384 1385 ## Audit logs in Juju-core 1386 1387 The GUI needs to be able to query *something* for a persistent log of changes in the environment. 1388 1389 - What events are auditable ? hatch: only events that cause changes in the environment. 1390 - tTm: who changed something, what was changed, when was it changed, what was it changed from, and to, why they were allowed to do it (Will). 1391 - Hatch: it needs to be structured events, user, event, description, etc, NOT just a blob of text. 1392 - Voidspace: do we need a query api on top of this ? filter by machine, by user, by operation, etc 1393 - Audit log entries are not protected at a per row level. Viewing the audit log will require a specific permission. 1394 - Not all users of the GUI may be able to access the audit log. 1395 - Audit log entries may be truncated, truncation will require a high level of permissions. 1396 - ACTION: determine auditable events. 1397 - ACTION: determine where to store this data, and what events to audit. 1398 - Hatch: it doesn’t need to be streaming from the start, but it should be possible. 1399 1400 ### Work Items 1401 1402 1. Create a state API for writing to the audit log (in mongodb). 1403 1. Record attempt before API request is run. 1404 1. Record success/error after API request is run. 1405 1406 ## Staging uncommited changes 1407 1408 Hatch doesn’t want to do this in Javascript, because it is not web scale. He wants the API server to handling this staging. 1409 1410 Thumper says that SABDFL says they want to be able to do this on the CLI as well. 1411 1412 - Nate: if we need to allow this to work across GUI and CLI then we have to store this data in the state. 1413 - Nate: do we need N staging areas per environment ? Nate: No, that is crazy talk, just one per environment. 1414 - Thumper: then we’ll need a watcher. 1415 - ACTION: uncommitted changes are stored in the state as a single document, a big json blob 1416 - ACTION: we need a watcher on this document. 1417 - Voidspace: entries are appended to this document, this could lead to confusion if people are concurrently requesting unstaged changes. 1418 - Hazmat doesn’t think we should store this in the state. 1419 - ACTION: Mark Ramm/hazmat to talk to SABDFL about the difficulty of implementing this. 1420 - All: do we have to have a lock or mode to enable/disable staging mode ? 1421 - Hatch: now the GUI and the CLI have different stories, the former works in staging mode by default, and the latter always commits changes immediately. 1422 - ACTION: a change via the CLI would error if there are pending changes, you can the push changes into the log of work with a --stage flag. Ramm: alternative, we tell the customer that the change has been staged, and they will need to ‘commit’ changes. 1423 - ACTION: the CLI needs a ‘commit’ subcommand. 1424 - Undo is out of scope, but permissible in a future scope; tread carefully. 1425 1426 ### Discussion Thurs May 1 1427 1428 - Moved into the idea of having an ApplyDelta API that lets you build up a bunch of actions to be changed 1429 - These actions can then all be in pending state, and you do a final call to apply them. 1430 - The actual internal record of the actions to apply is actually a graph based on dependencies 1431 - This lets you “pick one” to apply without applying the rest of the delta 1432 - Internally, we would change the current API to act via “create delta, apply delta” operations. 1433 - When a delta is pending, calling the current API could act on the fact that there are pending operations. 1434 - Spelling is undefined, e.g. 1435 - `named := CreateDelta()` 1436 - `AddToDelta(named, operation)` 1437 - `ApplyDelta(named)` 1438 - `ApplyDelta(operations)` 1439 - If it is just the ability to apply a listed set of operations, we haven’t actually exposed a way to collaborate on defining those operations. 1440 1441 ## Observability 1442 1443 How to expose more of what Juju is doing to allow users to make informed decisions. Key interface point via `juju status`. Consider instance / unit, observability and transparency. e.g.. what does pending really mean? Is it still in provisioning at the provider layer, is machine agent running? Is the install hook running? Is the start hook running? We collapse all of that done to a single state. we should ideally just push the currently executing hook into status. 1444 1445 ### To discuss 1446 1447 - How to display error condition concisely but allowing for more information if required. 1448 - Insight into logs - is debug log enough? (now has filtering etc). 1449 - Feedback when running commands via CLI - often warnings are logged server side, how to expose to users; use of separate back channel? 1450 - Interactive commands? Get input to continue or try again or error/warning? 1451 - Consistency in logging - guidelines for verbosity levels, logging API calls etc 1452 - How to discover valid vocabularies for machine names, instance types etc? 1453 - How to inspect relation data? 1454 - Should output variables be recorded/logged? 1455 - Provide --dry-run option to see what Juju would do on upgrades. 1456 - Better insight into hook firing. 1457 - Ability to probe charms for health? (incl e.g. low disk space etc). 1458 - Event driven feedback. 1459 - Integration with SNMP systems? How to alert when issues arise? 1460 1461 ### Work Items 1462 1463 - `juju status <entity>` reveals more about that entity - get all output on context that is specified. 1464 - Add new unit state - healthy/unhealthy. 1465 - Instance names/tag for machines (workload that caused it deployed). 1466 - Specifically, when deploying a service or adding a unit that requires a machine to be added, the provisioner should be passed through a tag of the service name or similar to annotate the machine with on creation. 1467 - Inspect relation data. 1468 - implement output variables (needs spec). 1469 - `add-machine`, `add-unit` etc need to report what was added etc 1470 - API for vocab (inst type). 1471 1472 ## Usability 1473 1474 ### Covers a number of key points 1475 1476 - Discoverable - features should be easily discoverable via `juju help` etc. 1477 - Validate inputs - Juju should not accept input that causes breakage, and should fail early. 1478 - Error response - Juju should report errors with enough information to allow the user to determine the cause, and ideally should suggest a solution. 1479 - Key workflows should be coherent and concise. 1480 - Tooling / API support for key workflows. 1481 1482 ### Agenda 1483 1484 - Identify key points of interaction - bootstrap, service deployment etc. 1485 - Current pain points e.g. 1486 - Tools packaging for bootstrap for dev versions or private clouds? 1487 - Open close port range? 1488 - Security groups! 1489 - What else? 1490 - What’s missing? Tooling? The right APIs? Documentation? Training? 1491 - Frequency of pain points vs impact. 1492 1493 ### Concrete Work Items 1494 1495 1. Improve `juju help` to provide pointers to extra commands. 1496 1. Transactional config changes. 1497 1. Fix destroy bug (destroy must be run several times to work). 1498 - Find or file bug on lp 1499 1. When a machine fails, machine state in juju status displays error status with error reason. 1500 1. Document rationale in code comment. 1501 1. `juju destroy service --force` 1502 1. Range syntax for open/close ports. 1503 1. Safe mode provisioning becomes default. 1504 1. Garbage collect security groups. 1505 1506 ## Separation of business objects from persistence model 1507 1508 A widely accepted architectural model for service oriented applications has layers for: 1509 1510 - services 1511 - domain model 1512 - persistence 1513 1514 The domain model has entities which encapsulate the state of the key business abstractions e.g. service, unit, machine, charm etc. This is runtime state. The persistence layer models how entities from the domain model are save/retried to/from non-volatile storage - mongo, postgres etc. The persistence layer translates business concepts like queries and state representation to storage specific concepts. This separation is important in order to provide database independence but more importantly to stop layering violations and promote correct design and separations of concerns. 1515 1516 ### To discuss 1517 1518 - Break up of state package. 1519 - How to define and model business queries. 1520 - How to implement translation of domain <> persistence model. 1521 1522 ### Goals 1523 1524 - No mongo in business objects - database agnosticism. 1525 - Remove layering violations which lead to suboptimal model design. 1526 - Scalability via ability to implement pub/sub infrastructure on top of business model rather than persistence model; no more suckiSpng on mongo firehose. 1527 1528 ### Work Items 1529 1530 1. Spike to refactor a subset of the domain model (e.g. machines). 1531 1. Define and use patterns (e.g. “named query”) to abstract out database access further (in spike). 1532 1. Define and use patterns for mapping/transforming domain objects to persistence model. 1533 1. If possible, define and implement integration with pub/sub for change notification. 1534 1535 ## Juju Adoption Blockers 1536 1537 [Slides with talking points](https://docs.google.com/a/canonical.com/presentation/d/1jcJ93Npuo60Iyy0BGSNap1kekQNxiZ7rDBJfuxAv_Go/edit#slide=id.ge4adadaf_1_645) 1538 1539 ## Partnerships and Customer Engagement 1540 1541 - Juju GUI has been a tremendous help. 1542 - Sales team enabler, to quickly and easily show Juju. 1543 - Every customer/partner asks 1544 - Where can I get a list of all charms? 1545 - Where can I get a list of all available relations? 1546 - Where can I get a list of all available bundles? 1547 - Where can I get a list of all supported cloud providers? 1548 - What about HA? What happens if the bootstrap node goes away? 1549 - We need to start demonstrating this, ASAP! 1550 - What if one of the connected services goes away? What does Juju do? 1551 - So, great, I can use Juju to relate Nagios and monitor my service. But what does Juju do with that information? Can’t Juju tell if a service disappears? 1552 - Auto-scaling? Built in scalability is great, but manually increasing units is only so valuable. 1553 - What do you mean, there aren’t charms available for 14.04 LTS yet? 1554 - *Yada yada yada* Docker *yada yada yada*? 1555 - Our attempts to shift the burden of writing charms onto partners/customers have yielded minimal results. 1556 - Pivotal/Altoros around CloudFoundry 1557 - CloudFoundry is so complicated, Pivotal developed their own custom Juju-like tool (BOSH) to deploy it, and their own “artifact” based alternative to traditional Debian/Ubuntu packaging. 1558 - CloudFoundry charms (and bundles) have proven a bit too complex for newbie/novice charmers at Altoros to develop, at the pace and quality we require. 1559 1560 ## Juju 2.0 Config 1561 1562 - Define providers and accounts as a first class citizen. 1563 - Eventually remove environments.yaml in favor of the above account configuration and .jenv files 1564 - Change ‘juju bootstrap’ to take an account and --config=file/--option=”foo=var” for additional options. 1565 - `juju.conf` needs 1566 - simplestreams source for provider definitions, defaulting to https://streams.canonical.com/juju/providers. 1567 - A new stream type “providers” containing the environment descriptions for known clouds (e.g. hpcloud has auth_url:xyz, type:OpenStack, regions-available: a,b,c, default-region:a). 1568 - Juju itself no longer includes the information inside the ‘juju’ binary, but depends on that information from elsewhere. 1569 - Providers section. 1570 - Locally define the data that would otherwise come from above. 1571 - Accounts section. 1572 - Each account references a single provider. 1573 - Local overrides for environment details (overriding defaults set in provider). 1574 1575 ## Distributing juju-core in Ubuntu 1576 1577 Landscape has a stable release exception for their client, not a micro release exception. We fulfil the rules for this even better than landscape does, as we have basically no dependencies at all. 1578 1579 We can split juju the client from jujud the server, though this isn’t terribly useful for us outside of making distro people happy. 1580 1581 Landscape process has two reviews before code lands, we used to do this but changed. Didn’t seem to drop quality our end. 1582 1583 Could raise at a tech board meeting item to sort out stable release things. 1584 1585 Having to have separate source packages for client and server would be annoying but painful, could we have different policies for binary packages generated from the same source package? 1586 1587 Dynamic linking gripes are not imminently going to be solved by anyone. 1588 1589 Have meeting with foundations to resolve some unhappinesses. 1590 1591 ## Developer Documentation 1592 1593 - https://juju.ubuntu.com/dev/ - Developer Documentation. 1594 - There exists an automated process to pull the files from the doc directory in the juju-core source tree and process the markdown into html, and uploads it into the WordPress site. 1595 - Minimal topics needed 1596 - Architecture overview 1597 - API overview 1598 - Writing new API calls 1599 - What is in state (our persistent store - horrible name, I know)? 1600 - How the mgo transactions work? 1601 - How to write tests? 1602 - Base suites 1603 - Environment isolation 1604 - Patch variables and environment 1605 - Using gocheck (filter and verbose) 1606 - Table based tests vs. simple tests 1607 - Test should be small and obviously correct 1608 - Developer environment setup 1609 - How to run the tests? 1610 - `juju test <filter> --no-log (plugin)` 1611 - https://juju.ubuntu.com/install/ should say install juju-local 1612 1613 ## Tools, where are they stored, sync-tools vs bootstrap --source 1614 1615 - FindTools is called whenever tools are required, which searches all tools sources again. 1616 - When tools are located in the search path, they are copied to env storage and accessed from there when needed. 1617 - Find is only to be called once at well defined points : bootstrap and upgrade. the tools are fetched into env storage so that e.g. during upgrade tools are sourced from there. 1618 - Need tools catalog separate from simplestreams for locating tools in env storage. 1619 - Bootstrap and upgrade and sync-tools need --source. 1620 1621 As is the case now, if --source is not specified, an implicit upload-tools will be done. 1622 1623 ## Status - Summary vs Detailed 1624 1625 Status is spammy even on smallish environments. It’s completely unusable on mid sized and larger environments. Can we make it easier to read, or make another status that is more of a summary view? 1626 1627 ### Work Items 1628 1629 1. Identify items in status output that may break people’s scripts if changed or removed. 1630 1. Add flags: 1631 - `--verbose/-v`: total status, current output + HA + networking junk 1632 - `--summary`: human readable summary - not YAML (this is dependant on mini-plugin below) 1633 - “`--interesting`”: items that aren’t “normal” (e.g. agent state != “Started”) 1634 1. Write mini-plugin that takes human readable YAML and generates human readable output e.g. HTML. 1635 1. Use watcher to monitor status instead of polling juju status cmd. 1636 1. Extend filtering. 1637 1638 ## Relation Config 1639 1640 When adding a relation, we want to be able to specify configuration specific to that relation. In settings terms, this will be “service-relation-settings”. We need to set config for either end of the service. Settings data stored for relation as a whole. 1641 1642 The relation config schema is defined in charm’s `metadata.yaml`. Separate config for each end of the relation. 1643 1644 The config is specified using add-relation `config.yaml` via `--config` option. 1645 1646 New Juju command `relation-get-config [-r foo]` to get config from local side of the relation. If inside hook we don’t need -r. 1647 1648 New `juju set-relation config.yaml` which will cause relation-config-changed hook to run. 1649 1650 ### Work Items 1651 1652 1. New add relation `metadata.yaml` schema. 1653 1. Ability to store relation settings in mongo. 1654 1. Support for processing relation config in `add-relation`. 1655 1. `relation-get-config` command. 1656 1. `set-relation-config` command. 1657 1. `relation-config-changed` hook 1658 1659 ## Bulk Cloud API 1660 1661 The APIs we use to talk to cloud providers are too chatty e.g. individual calls to start machines, open individual ports. 1662 1663 When starting many instances, partition them into instances with same series/constraints/distribution group and ask provider to start each batch. 1664 1665 ### Work Items 1666 1667 1. Unfuck instance broker interfaces to allow bulk invocation. 1668 1. Rework provisioner. 1669 1. Change instance data so that it is fully populated and not just a wrapper around an instance id, causing more api calls to be required. 1670 1. Audit providers to identify where bulk api calls are not used. 1671 1. Start instances to return ids only, get extra info in bulk as required. 1672 1. Single shared instance state between environs (updated by worker). 1673 1. Refactor prechecker etc to use cache environ state - reduce `New()` environ calls. 1674 1. Stop using open/close ports and use iptables instead. 1675 1. Use single security group. 1676 1. Use firewaller interface in providers to allow azure to be handled. 1677 1. Drop firewall modes in ec2 provider. 1678 1. Support specifying port ranges not individual ports (e.g. charm metadata). 1679 1. For hook tools - open ports on network for a machine not a unit. 1680 1681 ## Tools Placement 1682 1683 - Allow storage of tools in the local enviroment. 1684 - Providing a catalog of the tools in the local environment. 1685 - Refactoring the current tools lookup to use the catalog. 1686 - Provide tools import utility to get new tools into the environment. 1687 - Upgrades to check tools catalog to ensure tools are available for all required series, arches etc. 1688 - Same model as for charms in state. 1689 1690 ## Juju Documentation 1691 1692 **William**: Write documentation while designing the feature, and give them to Nick etc. before writing code. This is the word of god. 1693 1694 **Nate**: Use changelog file in juju-core repo to log features and bugfixes with merge proposals. 1695 1696 **Nick & Jorge**: we’re just a couple people, juju core is 20 people now. 1697 1698 **Ian**: can’t require changelog per merge, since a single feature may be many many merges, which might have no user facing features. 1699 1700 This must actually happen or Jorge has permission to kill Nate. 1701 1702 Nate to get buy in from team leads. 1703 1704 # Charm Config Schema 1705 1706 Users find our limited set of types in config (String, Bool, Int, Float) limited, and have to do things like pickle lists as base64. See [bug](https://bugs.launchpad.net/juju-core/+bug/1231526) which largely covers this. 1707 1708 - Map existing YAML charm config descriptions into a JSON schema. 1709 - Extend existing YAML config to something that can be mapped well to JSON schema. 1710 - Currently have a config field in charm document. 1711 - Create a schema document that charm links to. 1712 - Upgrade step that takes existing config field and creates new document linked to charm. 1713 - Add support in `juju set` for new format. 1714 - Add flag to `juju get` to output new format. 1715 1716 New types we want: enums, lists, maps (keys as strings, values as whatever). 1717 1718 Open questions: how charms upgrade their own schema types - there’s existing pain here where for instance the OpenStack charms are stuck using “String” for a boolean value because they cannot safely upgrade type. 1719 1720 Pyjuju had magic handling for schlurping files, there’s a bug feature request for a ‘File’ type. 1721 1722 Note this work does not include constraint vocabularies. See Ian Booth for that work. 1723 1724 # Juju Solutions & QA 1725 1726 This is very dependent on which charm you are looking at. I assume there were particular things that came up in the Cloud Foundry work that need attention. We have been building up test infrastructure quite quickly, which is one part of helping improve quality -- but the biggest thing is growing communities around particular charms. 1727 1728 # Juju QA 1729 1730 ## CABS Reporting 1731 1732 The feature has stalled as goals and APIs churned. 1733 1734 1. What are the goals of reporting? 1735 1. What is the data format that cabs will provide for reporting? 1736 1. How do we display the reports? 1737 1738 ## Scorecard 1739 1740 The scorecard is a progress report to measure our activity and correlate it to our successes and failures. Most of the work is done by hand. Though most of the information gathering can be automated, it was the lowest priority for the Juju QA team. How much time will we save if we automate some or all of the information gathering? 1741 1742 Juju QA has scripted most of what it gathers for the score card. The data is entered by hand instead of added to tables and charts by an automated process. These are the kinds of data that the team knows how to gather: 1743 1744 1. Bugs reported, changed, or fixed. 1745 1. Branch commits. 1746 1. Time from report, to start to release of bugs and commits. 1747 1. Releases of milestones. 1748 1. Downloaded installers and release tarballs (packagers and homebrew). 1749 1. Installs of clients from PPAs. 1750 1. Downloads of tools from public streams. 1751 1752 ### Work Items 1753 1754 1. GUI 1755 1. Bundles deployed 1756 1. Charms deployed 1757 1. visiting to jujucharms.com and juju.ubuntu.com 1758 1. Quick-start downloads 1759 1. Number of releases 1760 1. Number of bugs 1761 1. Number of bugs closed 1762 1. Core 1763 1. Number of external contributors 1764 1. Number of fix committed 1765 1. Number running envs (charmstore is queried every 90 min for new charms) 1766 - Do we know which env the charm query was for? 1767 1. Client installs (from ppa, cloud archive trusty) 1768 1. Number of tools downloaded (from containers and streams.c.c) 1769 1. Add anonymous stat collection to juju to learn more 1770 1. Eco 1771 1. Number of canonical and non-canonical charm committers 1772 1. Number of people in #juju (and #juju-dev) 1773 1. Number of subscribers juju and juju-dev mailing lists 1774 1. NUmber of charms audited 1775 1. AskUbuntu Conversion (Questions Asked & Answered) 1776 1. Number of tests in charms 1777 1. QA 1778 1. Metric 1779 1. Days to bug triage 1780 1. CI tests run per week 1781 1. Number of solutions tested 1782 1. Number of clouds solution tested on 1783 1. Number of juju core releases 1784 1785 ## Charm Testing Reporting 1786 1787 Charm test reporting has faced obstructions from several causes. There are two central issues. One, reliable delivery of data to report, and two, completion of the reporting views. 1788 1789 1. Charm testing data formats change without notice. 1790 1. Charm testing uses unstable code that can break several times a day, preventing gathering and publication of data. 1791 1. Charm testing leaves machines behind. 1792 1. Charm testing can exceed resource limits in a cloud. 1793 1. Charm testing doesn’t support multiple series. 1794 1. Charm reports doesn’t show me a simple table of clouds a charm runs on. 1795 1. Most charms don’t have tests-- can we have a simple test to get every charm listed? 1796 1. I don’t know the version of the charm. 1797 1. I don’t know the last version that passed all tests. 1798 1. Charm details reports don’t show me the individual tests. 1799 1. I don’t know the series. 1800 1. I don’t know the version that last passed the individual test. 1801 1802 ### Work Items 1803 1804 1. Create a new jenkin that uses the last known good version of substrate dispatcher (lp:charmtester). 1805 1. Staging charmworld or something will trigger a test of a branch and revision. 1806 1. Provide charmers with a script to test the MP/pull requests. 1807 1. Provide a way to poll Lp an Gh to automatically run the tests for the MP/PR. 1808 1. Provide a way to test tip of each promulgated charm. 1809 1. Reporting needs to pick the data from the new test runner/jenkin. 1810 1. Overview should list every charm tested. 1811 1. Does the charm have tests? 1812 1. A link to the specific charm results. 1813 1. Which clouds were tested and did the suite pass? 1814 1. What version was tested? 1815 1. What is the last known-good version to pass the tests for a substrate. 1816 1. What version passed all substrates. 1817 1. For any charm, I need to see specific charm results. 1818 1. Which substrates were tested? 1819 1. The individual tests run in substrate, show name of the test and pass/fail. 1820 1. Need a link to see the fail log located somewhere. 1821 1. What was the last version of the charm to pass the test. 1822 1. Update substrate dispatcher or switch to bundle tester to to gather richer data. 1823 1. Ensure `destroy-environment`. 1824 1. Capture and store JSDON data instead logs. 1825 1. We will get use cases for the charm test reports that will verify the report meets expectations. 1826 1. Tests could state their needed resources and the test runner can look to see if they are available. The tests can be deferred until resources are available. 1827 1828 ## Charm testing with juju Core 1829 1830 1. We test with stable juju and charm. 1831 1. We could test with unstable. 1832 1. Only test the popular charms for each revision. 1833 1. Or only test charm with tests. 1834 1. Or test bundles which has valid combinations. 1835 1. Test all the charms occasionally. 1836 1. Historically when charms break with new juju, it is the charm’s fault. 1837 1838 ## Charm MP/Pull Gate on Charm Testing 1839 1840 Charm merges could be gated on a successful test run against the supported clouds. 1841 1842 - Allow charmers to manually request a test for a branch and revision. 1843 - Maybe extend the script to poll for pull requests/merge proposals. 1844 - Charm testing doesn’t support series testing yet. 1845 1846 ### Testing 1847 1848 1. Test MP or pull request. 1849 1. Test merge and commit on pass. 1850 1. Charm testing runs and is actually testing that juju or ubuntu still works for the charm. 1851 1852 ## CI Charm and Bundle Testing 1853 1854 Testing popular bundles with Juju unstable to ensure the charms and bundles continue to work. 1855 1856 1. Notify the charm maintainer or the juju developers when a break will happen. 1857 1. Can testing be automated to grow newly popular charms and bundles? 1858 1. There are resource limits per cloud. 1859 1860 ### Notes 1861 1862 - Charm testing could be simplified to proof and unit tests. 1863 - Bundle tests would test relations. 1864 - Current tests don’t exercise failures or show error recovery. 1865 - Ben suggests that amulet tests in charms are could be moved to bundles. 1866 - Charms are like libraries, bundles are like applications. 1867 - Bundles are known topologies that we can support an recommend. 1868 - Charm tests could pass, but break other apps; the bundle level where we want to test. 1869 - Workloads are more like bundles, though some charms might be not need to in a relation, so a bundle of one. 1870 - Config testing is valuable at the charm-level and bundle-level. 1871 - Integration suites might work on a charm or a bundle. 1872 - Cloud-foundry tests only work with the bundle...running the suite for each charm means we construct the bundle multiple times and rerun tests. 1873 - The charm author might right weak tests. Reviewer need to see this and respond. Bundles represent how users will use the charm, and that is what needs testing to verify utility and robustness. 1874 - Bundle tester has a test pyramif. 1875 - Proofing each charm. 1876 - Discovering unit testing in each charm. 1877 - Discovering integration tests and running them. 1878 - Bundle testing has a known set of resources...which is needed when testing in a cloud. 1879 - Bundle tests provide the requirements for any software’s own stress and function tests. 1880 - Charm reports would use the rich JSON data. 1881 1882 ### Work Items 1883 1884 1. Review BenS Bundle testing for integration into QA Jenkins workflow 1885 1. Get back to BenS with any questions. 1886 1. Use cases to drive what reports need to show. 1887 1. What do the different stakeholders need to discover reading the reports? 1888 1. What actions will stake holders take when reading the reports? 1889 1. Do bundle tests poll for changes to bundles or the charms they use? 1890 1. The alternate would be to test on demand. 1891 1. Gated merges of MP/PR mean there is little value in testing on push. 1892 1893 ## CI Ecosystem Tests 1894 1895 We want to extend Juju devel testing to verify that crucial ecosystems tools operate with it. When there is an error, the Juju-QA team will investigate and inform 1 or both owners of the issue that needs resolution. 1896 1897 The juju under test will be used with the other project’s test suite. A failure indicates Juju probably broke something, but maybe the other project was using juju in an unsupported way. 1898 1899 Juju CI will provide a simple functional test to demonstrate an example case works. 1900 1901 We want a prioritised list of tests to deliver. 1902 1903 1. Juju GUI 1904 1. Juju Quickstart 1905 1. Azure juju GUI dashboard 1906 1. jass.io 1907 1. Juju Deployer 1908 1. mojo 1909 1. amulet 1910 1. charm tools 1911 1. charm helpers 1912 1. charmworld 1913 1914 ### Work Items 1915 1916 1. Quickstart 1917 1. Quickstart relies on CLI and API, and config files. It waits for the GUI to come up in the env then deploy bundles. 1918 1. Quickstart opens a browser to show. 1919 1. Testing 1920 1. Install the proposed juju. 1921 1. Run juju-quickstart bundle to a bootstrapped env. 1922 1. Tries to colocate the bootstrap node and GUI when not local provider and the series and charm have the same series. 1923 1. Otherwise GUI is in a different container. 1924 1. `juju status` will list the charms from the bundle. 1925 1. Rerun juju-quickstart bundle. 1926 1. Verify the same env is running with eh same services. 1927 1. GUI team need to write 1928 1. Functional tests. 1929 1. Allow the tests to be run on lxc. 1930 1. Juju GUI charm 1931 1. “make test” will deploy the charm about 8 times. 1932 1. GUI is deployed on bootstrap node to make the test faster. 1933 1. If the provider is local gui should be in a different container. 1934 1. The charms has tests that are run by juju test. 1935 1. The functional tests run the default juju. 1936 1. We can use the juju under test with the charm. 1937 1. An env variable is used select the series for the charm. 1938 1. Test with a bundle implicitly tests deployer. 1939 1940 ## CI Cloud and Provider Testing 1941 1942 Juju CI tests deployments and upgrades from stable to release candidate. We might want additional tests. 1943 1944 1. Canonistack tests are disabled. 1945 1. Swift fails; IS suspect misconfiguration or bad name (rt 69317). 1946 1. Canonistack has bad days where no one can deploy. 1947 1. Restricted and closed networks? 1948 1. CI has a restricted network test that shows the documented sites and ports are correct, but it doesn’t verify tools retrieval. 1949 1. A closed network test would have proxies providing every documented requirement of Juju. 1950 1. Constraints? 1951 1. Placement? 1952 1. `add-machine`, `add-unit`? 1953 1. Health checks for by series? 1954 1955 ### Work Items 1956 1957 1. Placement tests are required for AWS and OpenStack. 1958 1. `add-machine` and `add-unit` can be functional tests. 1959 1. Need nova console log when we cannot ssh in. 1960 1. Constraints are mostly 1961 1. Unique 1962 1. Azure availability sets (together relationship) 1963 1. AWS/OpenStack availability zones (apart relationship) 1964 1. Security groups 1965 1. MaaS networks 1966 1967 ## CI Compatibility Function Testing 1968 1969 Juju CI has functional tests that exercise a function works across multiple versions of juju and when juju is working with multiple versions of itself. 1970 1971 1. Unstable to stable command line compatibility. 1972 1. Verify deprecation, not obsolescence. 1973 1. Verify scripted arguments do not break after an upgrade. 1974 1. 100% major.minor compatibility. Stable micro releases work with every combination? 1975 1. The means keeping a pool of stable packages for CI. 1976 1. Encourages creating new minor stables instead of adding test combinations; but SRU discourages minor releases. 1977 1. CI is **blocked** because Juju doesn’t allow anyone to specify the juju version to bootstrap the env with, nor can agent-metadata-url be set more than once to control the version found. 1978 1979 ### Work Items 1980 1981 1. Juju bootstraps with the same version as the client. 1982 1. Then juju upgrades/downgrades the other agents to the current version. 1983 1. Ubuntu wants 100% compatibility between the client in trusty and all the servers that trusty has ever had. 1984 1. If trusty had juju 1.18.0, 1.18.1, 1.20.0, we need to show that clients work with all the servers. 1985 1. We could parse the help and when an option disappears, we report bugs when options disappear. We need to see that commands and options are deprecated. 1986 1. We want to remove the deprecated features from the help to keep docs clean, but that makes deprecations look like obsolescence 1987 1. Client to server is command line to API server. 1988 1. Standup each server, the for each client check that they talk. 1989 1. We don’t need to repeat historic combinations. 1990 Test the new client with the old servers. 1991 1. Test the old clients with the new servers. 1992 1. The tests could be status, upgrade, and destroy, but if we had a API compatability check, we could quickly say the client and server are happy together 1993 1. Maybe split the juju package to have a juju-server and juju-client package. Trusty gets the new juju client package. The servers are in the clouds. 1994 1995 ## CI Feature Function Testing 1996 1997 Juju Command testing 1998 1999 1. Backup and restore (in progress). 2000 1. HA 2001 1. Charm hooks, relations, and expose, and upgrade-charm. 2002 1. Is the env setup for the hook. 2003 1. Do relations exchange info. 2004 1. Do expose/unexpose update ports? 2005 1. `upgrade-charm` downloads a charm and calls the upgrade hook. 2006 1. ssh, scp, and run. 2007 1. We claim gets the same env as a charm...we can test that the charm and run have the same env. 2008 1. set/get config and environment. 2009 1. Which options are not mutable? 2010 2011 ### Work Items 2012 2013 1. For every new feature we want to prepare a test that exercises it. 2014 1. Developer are interested in writing the tests with QA. 2015 1. Some tests may need to be run in several environments. 2016 1. Revise the docs about writing test and send them to developers. 2017 1. Add coverage for historic features. 2018 1. `add-machine` / `add-unit` 2019 1. set/unset/get of config and env 2020 1. ssh, scp, and run 2021 1. charm hooks, relations, and expose, unexpose, and upgrade-charm 2022 1. init 2023 1. `get-constraints`, `generate-config` 2024 2025 ## CI LTS (and other series and archs) Coverage 2026 2027 What is the right level of testing? Duplicate testing for each supported series may not be necessary. Unnecessary tests take time and limited cloud resources. 2028 2029 1. Can we test each series as an isolated case from clouds and providers? 2030 1. Must we duplicate every cloud-provider test to ensure juju on each series in each cloud works. 2031 1. Local provider seems to need a test for each series and juju. 2032 1. Unit tests pass on amd64. 2033 1. PPC64el is close to passing. 2034 1. i386 and arm64 are not making progress. 2035 1. Switch to golang 1.2. 2036 2037 ### Work Items 2038 2039 1. The default test series will be trusty; precise is an exceptional case. 2040 1. Golang will be 1.2. 2041 1. Golang 1.2 must be backported to precise and maybe saucy. 2042 1. If not, juju will have to abandon precise or only be 1.1.2 compatable. 2043 1. Build juju on the real archs or cross compile to create tools. 2044 1. Build juju on trusty amd64. 2045 1. Build juju on precise amd64. 2046 1. Build juju on trusty i386. 2047 1. ppc64+trusty will make gccgo-based juju. 2048 1. Need a machine to do arm64+trusty to make gccgo-based juju. 2049 1. Maybe CentOS. 2050 1. Maybe Win8 (agent for active server charm). 2051 1. Remove the 386 unit tests, replace it wil a 386 client test. 2052 1. Add tests for precise (where-as we had has special tests for trusty). 2053 1. Test a precise upgrade and deploy in one cloud. 2054 1. Test each series+arch combination for local provider to confirm packaging and dependencies. 2055 1. precise+amd64 local 2056 1. trusty+amd64 local 2057 1. utopic+amd64 local 2058 1. trusty+ppc64 local 2059 1. trusty+arm64 local 2060 1. Test client-server different series and arch to ensure the client’s series/arch does not influence the selection of tools. 2061 1. Utopic amd64 client bootstraps a trusty ppc64. 2062 1. We already test win juju client to juju precise amd64. 2063 2064 ## CI MaaS and vMaaS 2065 2066 Juju CI had MaaS access to for 3 days. The tests ran with success. How do we ensure juju always works with MaaS? 2067 2068 1. CI wants 5 nodes. 2069 1. CI wants the provider to be available at a moment's notice to run tests for the new revisions, just like all cloud are always available. 2070 1. CI probably does care if MaaS is in hardware or virtualised. No public clouds support vMaaS today. 2071 2072 ### Work Items 2073 2074 1. Ask Alexis, Mark R, and Robbie for mass hardware or access to stable MaaS env. 2075 2076 ## CI KVM 2077 2078 Juju CI has local-provider KVM tests, but they cannot be run. Engineers have run them on their own machines. 2079 2080 1. CI wants 3 containers. 2081 1. CI needs root access on real hardware (hence developers run on their machines). 2082 1. CI does care about hardware; no public clouds support KVM today? 2083 2084 ### Work Items 2085 2086 1. We can use the one of the 3 PPC machines. 2087 1. We need to setup a slave in the network. 2088 1. Ideally we can add a machine and deploy Jenkins slave to it. 2089 1. Or we standup a slave without juju. 2090 1. Or we change the scripts to copy the tests to the machine. 2091 2092 ## Juju in OIL 2093 2094 We think there may be interesting combinations to test. We know from bug reports that Juju didn’t support Havana’s multiple networks. 2095 2096 1. We want to know if Juju fails with new versions of OpenStack parts. 2097 1. We want to know if Juju fails with some combinations of OpenStack. 2098 2099 ## Vagrant 2100 2101 1. Run the virtual box image in a cloud. 2102 1. We care that the hosts mapping of dirs works with the image so that the charms are readable. 2103 1. Exercise the local deployment. 2104 1. Deploy of local must work. 2105 1. Failures might be 2106 1. Redirector of GUI failed. 2107 1. Packages in the image needed updating. 2108 1. lxc failed. 2109 1. Configuration of `env.yaml` might need changing. 2110 1. Command line deprecated or obsolete. 2111 1. When juju packaging deps change, the images need updating. 2112 1. May need to communicate with Ben Howard to change the image. 2113 1. Can CI pull images from a staging area to bless them? 2114 1. Can we place the next juju into the virtual env to verify next juju works. 2115 2116 ## Bug Triage and Planning 2117 2118 We have about 15 months of high bugs. Our planning cycles are 6 months. Though we are capable of fixing 400 bugs in this time, we know that 300 of the bugs are reported after planning. We, stakeholders, and customers need to know which bugs we intend to fix and those that will only be fixed by opportunity or assistance. 2119 2120 1. Do we lower the priority of the 150 bugs? 2121 1. Do we make them medium? Medium bugs are not more likely to be fixed then low bugs...opportunity doesn’t discriminate by importance. We could say medium bugs are the first bugs to be re-triaged when we plan. 2122 1. Do we make them low? Low bugs obviously mean we don’t intend to fix the issue soon. Is it harder to re-triage all low bugs? 2123 1. Do we create more milestones to organize work and show our intent? Can we plan work to be expedited instead of deferred? 2124 1. Target every bug we intend to address to a cycle milestone. 2125 1. Retarget some to major.minor milestones as we plan work. 2126 1. Retarget each to major.minor.micro milestones when branches merge. 2127 1. Triaging every bug. Juju-GUI, deployer, charm-tools and a few others often have untriaged bugs that are week old. Who is responsible for them? https://bugs.launchpad.net/juju-project/+bugs?field.status=New&orderby=targetname 2128 2129 ### Work Items 2130 2131 1. Want milestones that represent now, next stable, and cycle. 2132 1. Now is the next release for the 2 week cycle. 2133 1. Team target the bugs they want to fix the cycle. 2134 1. We can see it burn down 2135 1. Next stable are all the bugs we think define a stable release. 2136 1. This doesn’t burn down because most bugs are retargeted. Some bugs will remain as they are the final bugs fixed to stable. 2137 1. 3 stable releases per 6-month cycle. 2138 1. Do we want a next next? 2139 1. The cycle is 3 or 5 months that are all the high bugs we want to fix. 2140 1. We define stable milestones by pulling from the horizon milestone. 2141 1. Can we ensure there is a maximum capacity for the milestone? If you add a bug, you must remove the bug. 2142 1. Critical 2143 1. CI breaks. QA team will do first level of analysis. 2144 1. Regressions are critical, but we may be reclassified. 2145 1. Critical bugs need to be assigned. 2146 1. Flaky tests are High bugs in the current milestone. 2147 1. Alexis and stakeholders will drive some bugs to be added or moved forward. 2148 1. We have 15 months of high bugs 2149 1. To harden we need know which high bugs need fixing. 2150 1. We want to retriage all the high bugs and make most of them medium? 2151 1. Review the medium bugs regularly to promote them to high for the upcoming cycle or demote them to low. 2152 1. We want 75 bugs to be high at any one time (1 page of high bugs). 2153 2154 ## Documentation 2155 2156 We want documentation written for the release notes before the release. We need greater collaboration to: 2157 2158 1. Know which features are in a release. 2159 1. Know how the features work from the developer notes. 2160 1. Include the docs to the release notes. 2161 1. Developers review the release notes for errors. 2162 1. Adequately document features in advance of release where possible. 2163 2164 We also need to discuss how versioning of the docs is going to work moving forward, and how we will manage and maintain separate versions of the docs, e.g. 1.18, 1.20, dev (unstable). 2165 2166 ## MRE/SRU Juju into trusty 2167 2168 We want the current Juju to always be in trusty. We don’t like the cloud-archive because the current juju isn’t really in Ubuntu. 2169 2170 - Ubuntu wants guaranteed compatibility. 2171 - CI needs to ensure all versions of juju in a series work together. 2172 - Landscape has an exception to keep current in all supported series. 2173 - Landscape only puts the client in supported series. 2174 - The server is in the clouds. 2175 - The client is stable, it changes slowly compared to the server. 2176 - The client works with many versions of the server, but tends to be used with the matching server. 2177 - James Page suggests that juju be packaged with different names to permit co-installs. juju-1.20.0. 2178 2179 ## Juju package delivers all the goodness 2180 2181 1. apt-get install juju could provide juju-core, charm-tools, deployer. 2182 2183 ## juju-qa projects 2184 2185 1. Juju is moving to GitHub, Jerff and other canonical machines can only talk to Launchpad. 2186 1. The ci-cd-scripts2 must be on Launchpad. 2187 1. We must split the test branch from the juju project. 2188 1. We may want to split the release scripts from test scripts. 2189 2190 # Juju Solutions 2191 2192 ## Great Charm Audit of 2014 2193 2194 We've been doing an audit over the last couple of months -- and will continue. We've scaled up the Charmers team from 2 people 5 months ago, to 7 or 8 by Vegas, so we are adding a lot more firepower on this front -- but that's all still new. I expect to see significant increase on our charming capacity for the next cycle. 2195 2196 ## Pivotal Cloud Foundry Charms 2197 2198 Discussion points: 2199 2200 1. The pivot from packages to artifacts and why. 2201 1. Tarball of binaries for a given release. 2202 1. +1 on proceeding for orchestrating artifacts post Bosh build. 2203 1. Altoros, internal staffing, schedule. 2204 1. CF Service Brokers. 2205 1. Brief look at current status, juju canvas. 2206 1. What is demo-able by ODS? 2207 2208 ## IBM Workloads 2209 2210 ## ARM Workloads 2211 2212 ## CABS 2213 2214 ## Amtulet 2215 2216 - We want to know which charms are following an interface exchanged. 2217 - When an interface is exchanged this is the information is passed. 2218 - Then replay that. 2219 - This boils down to we need an interface specification. 2220 - Mock up interface relations. 2221 - Or figure out what the status is of the health check links. 2222 - An opportunity to call the hook in integration suites. 2223 - Could adopt some simplified version of Juju DB. 2224 - They are talking about schema for next cycle. 2225 - That probably isn’t the right answer. 2226 - Someone would need to take over maintainership from Kapil. 2227 - You need detailed knowledge of how Juju works. 2228 - We want to know which charms are following an interface exchanged. 2229 - When an interface is exchanged this is the information is passed. 2230 - Then replay that. 2231 - Build a quorum of what an interface looks like. 2232 - This is the relation sentry in amulet. 2233 - The problem with the relation sentry is the name is based on the 2234 - Hacking around a problem that can be solved with tools in core. 2235 - If core is not going to fix this, we need to hack round it. 2236 - Bundle testing or Unit testing. 2237 - Is this portion of a deployment reusable to other? 2238 - Depends on where we are going. 2239 - 100% Certain bundle testing is the way of the future. 2240 - Take some time writing a test and see how it would look. 2241 - What is really needed? 2242 - Do a single bundle test and see what that looks like it. 2243 - Looking at this with a fresh set of eyes, may show us new aspects 2244 - Once we go through the review of CI and see if we can. 2245 2246 ## Charm Tools 2247 2248 ## CharmWorld Lib 2249 2250 ## Charm Helpers 2251 2252 - Folks interested: Chuck, Marco, Ben, Cory 2253 - Break out contrib into charm helper contirib. 2254 - Define a way to deliver. 2255 - Where to I get it? 2256 - How do I use it? 2257 - What libraries are available? 2258 - Actions 2259 - Delivery via the install hook. 2260 - Document 2261 - Move as much as possible out of contrib to core. 2262 - Thursday May 1 2263 - Use doctest to ensure the documents are right. 2264 - doctest does not scale up very well. 2265 - Unit test docs before promotion to core 2266 - Move the things from outside of contrib and core into core 2267 - Use Wheel packaging it is a blob format (make dist). 2268 - Actually use and adhere to symantic versioning. 2269 - This may include changes to charm helpers sync to get the right version. Fuzzy logic to find different versions. 2270 - Chuck = Investigate the Altoris charm template for charm helpers 2271 2272 ## Java Bundle 2273 2274 ## HDP 2.0 Bundle 2275 2276 - Create 12 charms for GA release of Hadoop Apache that Hortonworks supports. 2277 - http://hortonworks.com/hdp/ 2278 - Need to get communication from IBM on the porting of 12 components over to Power. 2279 - Need to identify which HDP version is going to be the released version. 2280 - 3.0 will most likely be the next GA release. 2281 - Need to support multi-language in the GUI. 2282 - Next milestone: 2283 - Hadoop Summit demo. 2284 2285 ## Big Data Roadmap 2286 2287 - Optimizations 2288 - File system via Juju through storage feature. 2289 -Image based Hadoop specific images. 2290 - Conferences 2291 - Hadoop Summit (June) 2292 - Strata NY 2293 - Demos 2294 - See how we can hook the Hadoop bundle into a charm framework bundle (e.g. Rails). 2295 - See how we can plug in multiple data sources. 2296 - Cancer, etc. 2297 - Feature requests 2298 - Ensure that for services that need different fault domains/availability sets. 2299 - This may be resolved with tagging in MaaS. 2300 - Tag fault domain 1 and fault domain 2. 2301 - This is exposed to juju via the GUI. 2302 - Have the GUI/Landscape show which machines are in a given zone. 2303 - Idea/need 2304 - We need to provide a means for Hadoop users to be able to put in their map-reduce java classes without having access to the admin portion of juju where hadoop is deployed. 2305 - The idea is to create a shim/relatoin/sub that provides a user level access to users to be able to add in their map-reduce jobs. 2306 2307 ## AmpLab Bundle 2308 2309 ## Juju Actions in Bundles 2310 2311 ## Charms in Git 2312 2313 ## Charms Series Saga 2314 2315 ## Fat Bundles and Caching Charms on Bootstrap Node 2316 2317 ## Fat Charms in Closed Environments 2318 2319 - Detect Ports calling to the outside network. 2320 2321 ## UA Charm Support Story 2322 2323 - Support bundles not charms. 2324 - CTS validates the bundle relations and config. 2325 - Has to have test. 2326 - Need to have bundles in the charms store mark that it is UA supportable. 2327 2328 ## How to engage Joyent & Altoros on provider support 2329 2330 ## Unstable Doc Branches & Markdown 2331 2332 ## Gating Charm merge proposals on charm testing passing 2333 2334 - Many useful relations. 2335 - I expect this is very charm specific -- please feel free to list relations that we need. 2336 2337 ## juju.ubuntu.ccom doc versioning 2338 2339 - Marco, Jorge, Curtis, Matthew 2340 - Branches will be versions in Git. 2341 - 1.18 2342 - en 2343 - fr 2344 - 1.20 2345 - en 2346 - fr 2347 - Branches will be versions in Git. 2348 - How to generate docs for live publishing. 2349 - Juju QA team will build the markdown to HTML conversion. 2350 - In this conversation the Juju QA team will also incorporate the languages and drop down for versioning. 2351 - Jorge to speak to the translations team on the best way forward. 2352 - When committing to docs master the reviewer should also commit to unstable docs. 2353 - Keep assets in a separate directory outside the versions and languages so we only have to updates one place for assets. 2354 2355 - Move author docs to a separate repository, but keep them in the nav for the live juju.ubuntu.com site. 2356 - The reason is that authoring docs should always be the current independent of the release. Charm authoring should work the same across all releases. Thus, we should always show the latest. 2357 - The main idea is to de-couple the charm author docs for the user docs as we always want to show the latest charm author docs (as charm authoring should always work the same across releases). This helps the scenario if we need to update the charm author docs we will need to update all the branches. 2358 - We will need to update the juju contributor docs once we move the charm author section. 2359 2360 ## Juju and OpenStack 2361 2362 - Juju in keystone - Juju as a multi-tenant component registered in keystone. 2363 - Juju in horizon - Juju gui and ui in horizon. 2364 - Juju in heat - Juju / Deployer/bundle style exposed as dsl in heat. 2365 2366 # Juju GUI 2367 2368 ## Juju in OpenStack Horizon - Juju GUI in horizon 2369 2370 ### Issues to resolve 2371 2372 - Embedding UI path? An OpenStack project or into an existing one. 2373 - Embedding UI as far as framing/styling. 2374 - Required timeframe, map out paths of resistance to make OpenStack release. 2375 - The guiserver (python/tornado) running in that stack. 2376 - No bundles without deployer access. 2377 - Build deployer into core? 2378 - Build a full JS deployer? 2379 - No local charms file content. 2380 2381 ## Juju in Azure - Juju GUI in Azure 2382 2383 ### Issues to resolve 2384 2385 - Embedding UI path? Hosted externally and referenced in? Need to meet specific Azure tooling requirements? 2386 - Embedding UI as far as framing/styling with existing Azure UX. 2387 - Additional required functionality. 2388 - List environments. 2389 - Required timeframe, map out paths of resistance to make deliverables. 2390 - The guiserver (python/tornado) running in that stack. 2391 - No bundles without deployer access. 2392 - Build deployer into core? 2393 - Build a full JS deployer? 2394 - No local charms file content. 2395 2396 ## Juju UI networks support 2397 2398 - Which types of networking supported and will be supported in core this cycle? Others planned to make sure design scales/works. 2399 - What does design have for UX of this so far? 2400 - Provider differences, sandbox, etc 2401 - Make sure api exposure is complete enough in core to aid all UI team needs put forth by design. 2402 - Get anything not onto someone’s schedule. 2403 2404 ## Juju UI Machine view 1.5 2405 2406 Most of this is a sync with design and check on what we put into 1.0 vs the final desired product. 2407 2408 - Deployed services inspector. 2409 - Better search integration. 2410 - Pre deployment config and visualization of bundles. 2411 - Better local charms integration. 2412 - Improved interactions (full drag/drop with the walkthrough/guide material). 2413 2414 ## Juju UI Design Global Actions 2415 2416 We’ve got a series of tasks on the list that require is to find a way to represent things across the entire environment. We need to sit down with design and look at a common pattern to use for these ‘global’ environment-wide tools, many of which mirror tasks at the service, machine, and unit level. 2417 2418 ### Items to discuss 2419 2420 - Design a home for global environment information. 2421 - HA status/make HA. 2422 - SSH Key management. 2423 - Environment level debug-log. 2424 - Environment level juju-run. 2425 2426 ## In the trenches - customer feedback for GUI 2427 2428 The GUI team would like to meet with ecosystems and others selling/deploying the GUI in the field and get feedback on things we can and should look at doing to make the GUI a better tool and product. The goal is to help prioritize and give us ideas of paper cuts we should schedule to fix during maintenance time in the next cycle. 2429 2430 ## Juju UI Product Priorities 2431 2432 There’s a backlog of features to add to the GUI. We need a product team opinion on which to prioritize as we work around bigger tasks like Azure embedding. We won’t be able to get all done this cycle so we’d like feedback on those most useful to selling/marketing Juju. 2433 2434 - Debug log 2435 - HA representation controls 2436 - Network support 2437 - Juju Run 2438 - Multiple Users 2439 - Fat bundles 2440 - juju-quickstart on OS X 2441 - juju-quickstart MaaS support 2442 - SSH Key management UI 2443 2444 ## Core Process Improvements 2445 2446 ### Documentation 2447 2448 - Ian - use launchpad to track what bugs are where and which are fixed. 2449 - Nate - an in-repo file is easier to keep track of, easier to verify during code reviews. 2450 2451 ### Standups 2452 2453 - Leads meet once a week. 2454 - Standups are squad standups. 2455 - William 1 on 1s with leads. 2456 - Team Leads email about team status. 2457 2458 ### Vetting Ideas on Juju-dev 2459 2460 - Send user feature description to juju-dev before working on features. 2461 2462 ### 2-Week Planning Cycle 2463 2464 - Dev release every 2 weeks. 2465 2466 ### Contributing to CI tests 2467 2468 - We should do that. 2469 2470 ### Move core to GitHub? 2471 2472 Needs to be scheduled and prioritized. Non-zero work to get it working (build bot, process, etc). 2473 2474 - Code migration 2475 - Code review 2476 - Landing process 2477 - Release process 2478 - CI 2479 - Documentation 2480 - Private projects (ask Mark Ramm) 2481 2482 ### Work Items 2483 2484 1. Code migration 2485 1. Do it all in one big migration. 2486 1. Namespace will be juju/core. 2487 1. Factor out others later. 2488 1. Disable GitHub bugtracker. 2489 1. Code review 2490 1. Aim to use native GitHub code review. 2491 1. Find out about diffs being able to be expanded (ok, done). 2492 1. Rebase before issuing pull request to allow single revision to be cherry picked (investigate to be sure). 2493 1. Branch setup 2494 1. Single trunk branch protected by bot. 2495 1. Landing process 2496 1. Check out Rick’s lander branch (juju Jenkins GitHub lander). 2497 1. Run GitHub Jenkins lander on Jenkins CI instance. 2498 1. Documentation 2499 1. Document entire process. 2500 1. CI 2501 1. Polling for new revisions. 2502 1. Building release tarball 2503