github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/contributions/juju-14-10-plans.md (about)

     1  title: Juju 14.10 Plans
     2  
     3  [TOC]
     4  
     5  # Core
     6  
     7  ## Multi-environment State Server
     8  
     9  Multi-environment, multi-customer.
    10  
    11  ### Use Cases
    12  
    13  - Embedding in Azure - people can spin up an environment without paying for a state-server, quite a cost for people to start.
    14  - Embedding in Horizon (OpenStack dashboard).
    15  
    16  ### How do we start?
    17  
    18  - Create multiple client users (some sort of api) aka User Management.
    19  - create-environment (need environments), list environments.
    20  	- SelectEnvironment is called after Login, Login itself exposes the multi-environment API root. To avoid an extra round trip, Login can optionally pass in the EnvironmentUUID.
    21  - Credentials need to move out of environment.
    22  - Machine/user/etc (everything) documents gain an environment id (except users).
    23  - API filters by environment id inherited from the SelectEnvironment.
    24  - rsyslog needs to split based on tenant.
    25  - provisioner/firewaller/other workers  gets environment id, one task per environment.
    26  - Consider doing with the accounts/environment separation from `environments.yaml` (Juju 2.0 conf).
    27  	- This is changing the DB representation so that we represent Environments referencing Accounts and pointing to Providers.
    28  	- It may be possible the EnvironConfig still collapses this into one-big-bag of config, but it should be possible to easily change your Provider Credentials for a given Account have that cascade to all of your environments.
    29  
    30  ### Work Items
    31  
    32  - State object gains an EnvironmentUUID attribute, all methods against that State object implicitly use that environment.
    33  - Update state document objects (machine,unit,relation-scopes,etc) to include EnvironmentUUID.
    34  - MultiState object
    35  	- Includes the Users and Environment collection.
    36  	- Used for initial Login to the API, and subsequent listing/selecting of environments.
    37  	- SelectEnvironment returns an API root like we have today, backed by a State object (like today) that includes the environment UUID.
    38  	- **Unclear**: how to preserve compatibility with clients that don’t pass the environment UUID.
    39  	- Desirable: being able to avoid the extra round trip for Login+SelectEnvironment for most commands that know ahead of time (`status`, `add-unit`,etc).
    40  - Admin on State server gives you the global rights across all Environments.
    41  - Environments collection.
    42  - MultiState APIs
    43  	- `ListEnvironments`
    44  		- Needs to filter based on the roles available to the user in various environments. Should not return environments that you don’t have access to.
    45  	- `SelectEnvironment`
    46  	- `CreateEnvironment`
    47  	- `DestroyEnvironment`
    48  - Logging
    49  	- TBD, regardless the mechanism, we need environment UUID recorded per log message, so we can filter it out again.
    50  	- In rsyslog it could be put into the prefix, or sharded into separate log files. 
    51  - Include the GUI for the environment on (in) the state server per environment.
    52  
    53  ## HA
    54  
    55  - Current Issues
    56  	- `debug-log` retrieves the log from one API server; so in case of an HA environment not all logs are retrieved.
    57  	- https://bugs.launchpad.net/juju-core/+bug/1310268
    58  - What is missing?
    59  	- HA on local.
    60  - Next steps
    61  	- Decrease count (3 -> 1, 5 -> 3).
    62  	- Scaling API separately from mongo.
    63  
    64  ### Notes
    65  
    66  Working on rsyslog.  Working logging to multiple rsyslogd.  This is ready to be reviewed.
    67  
    68  Need to update the conf when machines added or removed.  This needs to be done.
    69  
    70  Possible problem: logs being very out of order (hours off).
    71  
    72  **Bug**: Peergrouper log spam on local.
    73  
    74  HA on local can’t work 100% because VMs can’t start new VMs, so only machine 0 can be a useful master state server.  However, there are other tests that can be done with HA that would be useful on local HA. 
    75  
    76  It would be useful to be able to have the master state server be beefy and higher priority for master, and the non-masters be non-beefy, because the master has a ton more load than the non-masters.  Right now, ensure availability is very broad and vague.  It’s not tweakable. However, you can do it by bootstrapping with a big machine, change the constraints to smaller machines, then ensure availability.  The only thing we would need to add is a way to give a state server a higher priority for becoming master.
    77  
    78  Need better introspection to the status so that the GUI can better reflect what’s going on.  Need GUI to be able to call ensure availability.
    79  GUI needs to show state servers.
    80  
    81  Restore process for HA is just restore one machine then call ensure availability.
    82  
    83  ### GUI needs
    84  
    85  - allwatcher needs to add fields for HA status changes.
    86  - GUI needs to know what API Address to talk to, handle fallback when one goes away, and keep up to date to know who else to talk to.
    87  - ensure-availability needs to return more status (actions triggered).
    88  - How is HA enabled/displayed in GUI? what does machine view show?
    89  - Can you deploy multiple Juju-GUI charms for HA of the GUI itself?
    90  
    91  ### CI
    92  
    93  1. shutdown the master node or temporarily cripple the network to verify HA resolves the returned master.
    94  2. Test on local because local will be used in demonstrations.
    95  3. If backup-restore is also being done, then a restore of the master is a new master; ensure-availability must be rerun.
    96  
    97  ### Work Items
    98  
    99  - **Bug**: Agent conf needs to store all addresses (hostports), not just private addresses.  Needed for manual provider.
   100  - **Bug**:  Peergrouper log spam on local.
   101  - Change mongo to write majority, this is a change per session.
   102  - Change mongo to write WAL logs synchronously.
   103  - Need docs about how to use ensure availability on how to remove a machine that died. (try to improve the actual user story for how this works).
   104  - `juju bootstrap` && `juju ensure-availability`  (should not try to create a replacement for machine-0).
   105  - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down.
   106  - Machine that was down, ensure-availability was run to replace it, when the machine comes back it should not have a vote and should not try to be another API server.
   107  - “juju upgrade-juju” should coordinate between the API servers to enable DB schema updates (before rewriting the schema make sure all API servers are upgraded and then only the master API server performs the schema change).
   108  - APIWorker on nodes with JujuManageEnvironment should only connect to the API Server on localhost.
   109  - Determine how Backup works when in HA.
   110  - Changes for GUI to expose HA status.
   111  - Changes for GUI to monitor what the current API servers are (need the Watcher that other Agents use exposed on the Client Facade).
   112  - `ensure-availability` needs to return more status (actions triggered) (EnsureAvailability API call should return the actions).
   113  - Change mongo to write majority, this is a change per session.
   114  - Change mongo to write WAL logs synchronously.
   115  - Need docs about how to use ensure availability on how to remove a machine that died.
   116  - `juju bootstrap && juju ensure-availability`
   117  - Set up all status on bootstrap machine during bootstrap so it is created in a known good state and doesn’t start up looking like it’s down.
   118  
   119  ### Work items (stretch goals)
   120  
   121  - Ability to reduce number of state servers.
   122  - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers).
   123  - Ability to set priority on a state server.
   124  - Ability to reduce number of state servers.
   125  - Autorecovery - bringing back machines that die (or just calling  ensure availability again).
   126  - Handle problem with ensure availability getting called twice in a row (since new servers aren’t up yet, we start more new state servers).
   127  
   128  ## State, status, charm reporting
   129  
   130  Statuses like ‘started’ don’t have enough detail. We don’t know the true state of the system or a charm from status like started.
   131  
   132  - s/ready/healthy   and s/unready/unhealthy
   133  - Add jujuc tools ready and unready (healthy, unhealthy).
   134  	- Ready takes no positional arguments.
   135  	- Unready takes a single positional argument that is a message that explains why.
   136  	- Charm authors choose the message they want to use.
   137  	- Both ready/unready, when called without other flags, apply to the unit that is running.
   138  	- The above also accept a relation flag, `-r <relation id>`, which applies the status to the specified relation.
   139  	- The status data for a unit keeps track of the ready status, expose in status.
   140  	- Implementation needs to be shared with allwatcher so gui gets to see the info.
   141  - Implement a ready-check hook that will be called periodically if exists; units expected to update ready status to be reported when hook is called.
   142  - The details states are sub-statuses of ‘started’.
   143  - Possible granular statuses for units.
   144  	- provisioned
   145  	- installing (sub or pending)
   146  - Juju will poll the ready-check hook for current state. Charms need to respond ready or unready.
   147  - We might want a concise and summary of the status. GUI might want to show the concise and later show the summary.
   148  - Status is already bloated.
   149  	- Can status be intelligent enough to only include the data needed?
   150  	- Can you subscribe to get updates for just the information you think is changing...subscribe to the allwatcher?
   151  - `juju status --all` would be the current behavior.
   152  	- We would `start --all` being implicit, but depreciated.
   153  	- We will switch to a more terse format.
   154  - The status “started” is not really ready.
   155  	- There may be other hooks that still need to run.
   156  	- Only the charm knows when the service is ready.
   157  - When install completes, the status is implicitly “started”.
   158  	- The charm author can set install to return a message to mean it is unready.
   159  - Authors want to know when a charm is blocked because it is waiting on a hook.
   160  	- We can solve 80% of the problem with some effort but a proper solution is a lot of work.
   161  	- It isn’t clear when one unit is still being debugged.
   162  
   163  ### Work Items
   164  
   165  1. Introduce granular statuses.
   166  1. Implement filters/subscribers to retrieve granular status.
   167  1. Unify status and all-watcher.
   168  1. Switch status from --all to the concise form
   169  	- (?) know when the charm is stable, when there are no hooks queued 
   170   	- (?) know when all services are stable
   171  1. When deploying then adding debug-hooks, the later could set up a pinger for the service being deployed, which puts the service into debug as it comes up.
   172  1. `juju retry` to restart the hooks because resolved is abused.
   173  
   174  ## Error Handling
   175  
   176  - JDFI. We have a package. Use it.
   177  - We need to annotate with a line number and a stacktrace.
   178  	- We have type preservation.
   179  - There are some agreement to change the names of some of the API.
   180  - Add this as we needed. Switching all code to use it is stalling the production line.
   181  - Reviewer will push back to use the new error handling.
   182  
   183  ### Work Items
   184  
   185  1. Extend juju errors package to annotate with file and line number.
   186  1. Log the annotated stack trace.
   187  1. Change the backend to use `errgo`.
   188  1. We need a template (Dimiter’s example) of how to use error logging.
   189  
   190  ## Image Based Workflows
   191  
   192  Charms able to specify an image (maybe docker) with the addition of storage, storage dirs are passed into docker as it is launched.
   193  
   194  Unit agent may run either inside or outside the docker container (not yet determined).
   195  
   196  Machine agent would mount the storage, and the charm directory into the docker container when it starts. The hooks are executed the docker container.
   197  
   198  Looking to make the docker support a first class citizen in Juju.
   199  
   200  *“Juju incorporates docker for image based workflows”*
   201  
   202  Maybe limited to ones based on ubuntu-cloud image (full OS container).
   203  
   204  May well have a registry per CPC to make downloading images faster on that cloud.
   205  
   206  Perhaps have docker file (instructions to build the image) into the charm.  The registry that we look up needs to be configurable.
   207  
   208  Offline install will require pulling images into a local registry.
   209  
   210  ### Work Items
   211  
   212  1. Unit agent inside container.
   213  1. Image registry.
   214  1. Charm metadata to describe the image and registry.
   215  1. Deployer to understand docker, deployer inspects charm metadata to determine deployment method, traditional vs. docker.
   216  1. A docker deployer needs to be written that can download the image from a registry, and start the container mounting the agent config, storage, charm dir, upstart script for unit agent (if unit agent inside).
   217  1. Need docker work to execute hooks inside the container from the outside.
   218  
   219  **Depends on storage 0.1 done first.**
   220  
   221  ## Scalability
   222  
   223  ### Items that need discussion in Vegas
   224  
   225  - How do we scale to an environment with 15k active units? 
   226  	- How do admin operations scale? 
   227  	- How do we handle failing units? 
   228  		- dump and re-create
   229  		- Interaction with storage definition.
   230  	- How do we make a “juju status” that can provide a summary without getting bogged down in repeated information
   231  	- How does relation/get/set change propagation scale? 
   232  	- Where are the current bottlenecks when deploying say hadoop? 
   233  	- Where are the current bottlenecks when deploying OpenStack? 
   234  - Pub/Sub
   235  	- What do we need to do here? 
   236  	- Notes:
   237  		- We need a pub/sub for the watchers to help scale.
   238  		- Each watcher pub/subs on its own, move up one level?
   239  		- Need for respond to events that occur, in a non-coupled way (indirect sub to goroutine).
   240  		- Logging particular events? 
   241  		- Only one thing looking at the transaction log, whoops, not as bad as we thought.
   242  		- 100k units, leads to millions of go-routines, blocking is an issue.
   243  		- If we do a pub/sub system, let’s use it everywhere? Replace watchers?
   244  		- Related to idea of pub/sub on output variables and the like it sounds like.
   245  		- Watching subfield granularity of a document perhaps?
   246  		- 0mq has this, should reuse that and not invent our own pub/sub.
   247  		- 0mq has Go bindings, wonder if it works in gccgo.
   248  		- Does this replace the api? No, can’t Javascript to 0mq directly so need some api-ness for clients.
   249  		- Are there alternatives to the watcher design?
   250  		- Really good for testing. Can decouple parts and make it easy/fast to test if the event is fired.
   251  		- Shared watcher for all things (On the service object?)
   252  		- Have a big copy of the world in memory, helps with a lot of this.
   253  		- Charm output variables watching, charm outputs, hits state, megawatcher catches and updates and tells everyone it’s changed.
   254  		- Helps with ABA problem using in memory model.
   255  		- Use 3rd party pub-sub rather than writing our own.
   256  
   257  ### Work Items
   258  
   259  1. Boot generic machines which then ask juju for identity info.
   260  1. Bulk machine provisioning.
   261  1. Fix uniter event storm due to “number units” changed events.
   262  1. Implement proper pub sub system to replace watchers.
   263  1. State server machine agent (APIServer) should not listen to outside connection requests until it itself (APIWorker) has started.
   264  
   265  ## Determinism
   266  
   267  ### First Issue: Install repeatability
   268  
   269  There are two approaches to giving us better isolation from network and other externalities at deploy time.
   270  
   271  1. Fix charms so they don’t have to rely on external resources.
   272  	- Perhaps by improvements around fat.
   273  		- REQUIRED: Remove internal Git usage (DONE).
   274  	- Perhaps by making it easy to manage those resources in Juju itself? 
   275  		- Either create a TOSCA like “resources” catalog per environment: upload or fetch resources to the environment at deploy time (or as a pre-deploy step)
   276  		- or create a single central resource catalog with forwarding aka “gem store for the world”.
   277  1. Snapshot based workflows for scale/up/down so external resources aren't hit on every new deploy.
   278  	- We could add the hooks necessary to core, but the actual orchestration of images, seems a bit more tricky and could depend on a better storage story.
   279      
   280  ### Second Issue
   281  
   282  From Kapil: “Runtime modification of any change results in non deterministic propagation across the topology that can lead to service interruption. Needs change barriers around many things but thats not implemented or available. e.g. config changed and upgrade for example executed at the same time by all units.”.
   283      
   284  ### Upgrade Juju
   285  
   286  `juju upgrade-juju` -> goes to magic revision (simple bug fix) that an operator can’t determine.
   287  
   288  Juju internally lacks composable transactions, many actions violate semantic transaction boundaries and thus partial failure states leave inconsistencies.
   289  
   290  Kapil notes: 
   291  
   292  > One of the issues with complex application topologies is how runtime changes ripple through the system. e.g. a config change on service propagates via relations to service b and then service c. It's eventually consistent and convergent, but during the convergence what's the status of the services within the topology. Is it operational? Is temporarily broken?
   293  
   294  > **This is a hard problem** to solve and its one I've encountered in both our OpenStack and Cloud Foundry charms. 
   295  
   296  > In discussions with Ben during the Cloud Foundry sprint, the only mitigation we could think of on Juju's part was some form of barrier coordination around changes. e.g. that the ripple proceeds evenly through the system. It's not a panacea but it can help. Especially so looking at simpler cases of just doing barriers around `config-change` and `charm-upgrade`. What makes this a bit worse for Juju then other systems, is that we're purposefully encapsulating behavior in arbitrary languages and promoting blind/trust based reuse, so a charm user doesn't really know what effect setting any config value will have. e.g. the cases I encountered before were setting a 'shared-secret' value and and an 'ssl' enumeration value on respective service config... for the ssl i was able to audit that it was okay at runtime.. but thats a really subtle thing to test or detect or maintain.
   297  
   298  > Any change can ripple through the topology.  We have an eventually-consistent system, but while it is rippling, we have no idea.  Lack of determinism means someone who uses Juju cannot make uptime guarantees
   299  
   300  **Bug**: downgrading charms is not well supported.
   301  
   302  ### Questions
   303  
   304  - Do we need barriers?  e.g. config-changed affects all units of a service simultaneously.
   305  - Do we need pools of units within a service?
   306  
   307  ### Work Items
   308  
   309  - Unit-Ids must be unique (even after you've destroyed and re-created a service).
   310  - Address changes must propagate to relations.
   311  - `--dry-run` for `juju upgrade-juju`.
   312  - `--dry-run` for deploy (what charm version and series am I going to get?).
   313  
   314  ## Health Checks
   315  
   316  Juju “status” reporting in charm needs to be clearly defined and expressive enough to cover a few critical use cases.   It is important to note that BOSH has such a system. 
   317  
   318  - Canaries and rolling unit upgrades (health check as pre-requisite).
   319  - Is a service actually running?
   320  - Coordination of database schema upgrades with webserver unit upgrades (as an example of the general problem of coordinated upgrades).
   321  - Determining when HA Quorum has been reached or a server has been degraded. 
   322  
   323  ### Questions
   324  
   325  - We discussed Error, and Ready as states, but do we need a third? Pending, Error, and Ready? 
   326  - Do we need any more than three states? 
   327  - Suggestion:  Three states, plus an error description JSON map. 
   328  
   329  ## Storage management
   330  
   331  ### Allow charms to declare storage needs (block and mount). 
   332  
   333  - [Discussion from Capetown](https://docs.google.com/a/canonical.com/document/d/1akh53dDTROnd0wTjGjOrsEp-7CGorxVp2ErzMC_G-zg/edit)
   334  - [Proposal post Capetown (MS) (lacks storage-sets)](https://docs.google.com/a/canonical.com/document/d/1OhaLiHMoGNFEmDJTiNGMluIlkFtzcYjezC8Yq4nAX3Q/edit#heading=h.wjxtdqqbl1fg)
   335  
   336  Entity to be managed:
   337  
   338  - CRUD, snapshot
   339  
   340  Charms declare it:
   341  
   342  - Path, type (ephemeral/persistent) block.
   343  
   344  Storage 0.1:
   345  
   346  - Storage set in state - track information in some way.
   347  - Disks ( placement, storage).
   348  - Provider APIs (to create, delete, attach storage, … expand for later).
   349  - Provider to be able to attach storage to a machine.
   350  - Charms need to be able to say in metadata.
   351  - `jujud` commands to have charms be able to resolve where the storage is on the machine.
   352  - Degradation, manual provider or other provider that doesn’t provide storage (DO), do not fail to deploy, but we need to communication warning of some form, CLI should fail? API will not?
   353  
   354  Storage set, need to talk services, needs to be exposed as management processes.
   355  
   356  Multitenant storage? Probably not for initial implementation, but ***do not design it out***.
   357  
   358  Need to consider being able to map our existing storage policy onto the new design (e.g. AWS EBS volume for how Juju works with Amazon)
   359  
   360  NOTE: Storage is tied to a zone, ops can take a long time to run.
   361  
   362  Consider upgrades of charms, and how we can move from the existing state where a charm may have their own storage that they have handled, to the new world where we model the storage in state.
   363  
   364  - (2) Add state storage document to charm document.
   365  	- Upgrading juju should detect services that have charms with storage requirements and fulfill them for new units.
   366  - (6) Add state for storage entities attached to units.
   367  	- Lifecycle management for storage entities.
   368  - (6) When deploying units, need to find out storage is needed.
   369  	- Make provisioner aware of workloads and include storage details when needed.
   370  	- Change unit assignment to machine based on storage restrictions.
   371  - (4) Define provider apis for handling storage.
   372  	- Create new volume.
   373  	- Delete volume.
   374  	- Attach volume to instance.
   375  - (12) Implement provider APIs for storage on different providers.
   376  	- OpenStack
   377  	- EC2
   378  	- MaaS
   379  	- Azure?
   380  - (0) Consider storage provider APIs for compute providers that have storage as a service.
   381  - (2) Define new `metadata.yaml` fields for dealing with storage.
   382  - (0) Consider mapping between charm requirements and service-level restrictions on what storage should actually be provided.
   383  - (4) Add storage to status.
   384  	- Top level storage entity keys.
   385  	- Units showing storage entities associated.
   386  	- Services show storage details.
   387  - (4) CLI/API operations on storage entities.
   388  	- Add storage.
   389  	- Remove storage.
   390  	- Further operations? Resize? Not now.
   391  
   392  ## Juju as a good auto-scaling toolkit
   393  
   394  *Not a goal:  Doing autoscaling in core.*
   395  
   396  Goal: providing the API’s and features needed to easily write auto-scaling systems for specific workloads. 
   397  
   398  Outside stakeholders: Cloud Installer team. 
   399  
   400  We need to be able to clean up after ourselves automatically. 
   401  Where “clean up” actions are required, they need to take bulk operation commands. 
   402  
   403  - Destroy-service/destroy-unit should cascade to destroy dirty machines
   404  
   405  ## IAAS scalability
   406  
   407  - Security Group re-think: 
   408  	- The security group approach needs to switch to per-service groups
   409  	- We need to support individual on-machine/container firewall rules
   410  - Support for instance type and AZ locality
   411  
   412  ## Idempotency
   413  
   414  Is this a juju issue, or a charm issue?  Config management tools always promise this, but rarely deliver -- though many deliver **more** than juju.  What are the specific issues in question with Cloud Foundry?
   415  
   416  ## Charm "resources" (fat bundles, local storage, external resource caching)
   417  
   418  ### Problem Statements
   419  
   420  - Writing and maintaining “fat” charms is difficult
   421  	- Forking charm executables to support multiple upstream release artifacts is sub-optimal
   422  	- Fat charms are problematic
   423  - Non-fat charms are dependent on quite a few external resources for deployment
   424  - Non-fat charms are not *necessarily* deterministic as to which version of the software will be installed (even to the point of sometimes deploying different versions in the same service)
   425  
   426  ### Proposed Agenda
   427  
   428  - Discuss making “fat charms better” 
   429  	- Switch to a “resources” model, where a charm can declare the ‘external’ content that it depends on, and the store manages caching and replication of it
   430  - Consider building on the work IS has done
   431  - Choose a path, and enumerate all the work that needs to be done to fully solve this problem
   432  
   433  ### Proposal
   434  
   435  - ~`resource-get NAME` within a charm to pull down a published blob~
   436  - Instead using a model where charms request names, the charm overall declares the resources it uses, and the Uniter ensures that the data is available before firing the upgrade/install hooks.
   437  - `resources.yaml` declares a list of streams that contain resources
   438  
   439  ```
   440  default-stream: stable
   441  	streams:
   442  		 stable:
   443  		devel:
   444  			common:
   445  				common.zip
   446  			amd64:
   447  				foobar.zip
   448  ```
   449  					
   450  - Rresources directory structure for charms should match those of the charm author so bind-mounting the directory for development still works in the deployed version of the directory structure, you will only have common and arch specific files. Should there be a symlink to specific arch? Either:
   451  	- publish charm errors if there are name collisions across common and arch specific directories. This way all the files are in a resources directory for the hook execution. This does mean that the charm developer needs a way to create symlinks in the top directory to the current set of resources they want to use (charm-tool resources-link amd64) - Windows? (they have symlinks right?)
   452  	- charm has resources/common, and resources/arch. “arch” is still a link, but just one.
   453  	- charm has resources/common, and resources/amd64 
   454  		- this requires the hook knowing the arch
   455  - charm identifiers become qualified with the stream name and resource version (precise/mysql-25.devel.325)
   456  - juju status will show new version available if the entire version string (including resources) changes.
   457  	- if mysql-25.devel.325 is installed, and a different version of resources becomes current, this will be shown in “juju status”
   458  	- currently ask for mysql-latest, should perhaps be changed to mysql-current as we don’t necessarily want the latest version
   459  - each named stream has an independent version, which is independent of both other streams and of the explicit charm version.
   460  - upgrade-charm upgrades to the latest full version of the charm, including its resources
   461  - upgrade-charm reports to the user what old version it was at and what the new version it upgraded to
   462  - blobs are stored in the charm store, your environment always has a charm store, which can be synced for offline deployments
   463  - today deploy ensures that the charm is available, copies it to environment storage, this will now need to do the same for the resources for the charm
   464  - deploy should also confirm that the charm version and resource version are compatible
   465  	- `juju deploy mysql-25.dev.326` may fail because resources version 326 has a different manifest than declared in charm 25’s `resources.yaml`
   466  	- `juju deploy mysql`
   467  		- finds the current version of the charm and resources in the default stream
   468  		- the charm store has already validated that they match
   469  	- `juju deploy mysql-25`
   470  		- uses the default stream
   471  		- how do we determine the resources for this version
   472  			- Does current match? If yes, use it.
   473  			- If 25 < current, then look back from current resources and grab the first that has a matching manifest.
   474  			- Could just fail.
   475  			- Charm store could track the best current resources for any given charm version, as identified by moving the current resources pointer while keeping the charm pointer the same. For charm versions that are current, remember the current resources version. 
   476  			- If we take this approach, there will be charm versions that have never been “current”, so deploying this without explicitly specifying the resources version will fail.
   477  	- `juju deploy mysql.nightly` (syntax TBD)
   478  	- `juju deploy mysql --option=stream=nightly` (hand wave - we don’t like this one as getting full version partly from config feels weird)
   479  		- Find the current version of mysql and the current version of the paid stream resources.
   480  		- So, the charm store needs to remember the current resources for each stream for each charm version for the current values.
   481  - charm store has pointers for “current” version of charms and “current” version of resources
   482  - charm store requires that the resources defined in the current pointers have the same shape (same list of files)
   483  - `charm-publish` requires a local copy of all resources (for all architectures), and validates that `resources.yaml` matches the resources tree.
   484  - `charm-publish` computes the local Hash of resources, and the Manifest for what is currently in the charm store to publish both the charm metadata and all resources in a single request
   485  	- publishing does not immediately move the ‘current’ pointer. This allows someone to explicitly deploy the version and test that the charm works with that version of resources
   486  - supported architectures is an emergent property tracked by the charm store (known bad/unknown/known good) based on testing - hand wave
   487  - charm store will be expected to de-dupe based on content hash (possibly a pair of different long hashes just to be sure)
   488  	- don’t let the manifest just be the SHA hash without a challenge
   489  		- either random set of bytes from the content
   490  		- salted hash - charm store gives the salt, publish charm computes the salted hash to confirm that they actually have the content
   491  
   492  ### Spec
   493  
   494  - be clear about what is in the charm store, what is defined by the charm in `resources.yaml`, and what deployed on disk
   495  - use cases should show both the charm developer workflow, and user upgrade flow (which files get uploaded/downloaded etc)
   496  - developing a new charm with resources
   497  	- with common resources
   498  	- with different resources for different architectures
   499  	- with some architectures needing specific files
   500  - upgrade a charm by just modifying a few files
   501  - upgrade a charm by only modifying charm hook
   502  - upgrade both hook and resources
   503  - adding new files
   504  - docker workflow with base image and one overlay
   505  - updating the overlay of a docker charm
   506  - adding a new docker overlay will cause a rev bump on the charm as well as the resources because the resources.yaml file has to change to include the new layer
   507  	- illustrate explicitly the workflow if they forget to add the new layer to the resources.yaml file - publish fails because resources.yaml doesn’t match disk resources directory tree
   508  
   509  ### Discussion
   510  
   511  - Canarying will have to be across charm revision, blob set, and charm config.
   512  	- The charm version now includes charm revision and resources revision.
   513  - Further discussion needed around health status for canaries later.
   514  - Access control needs to be on top of the content addressing, just knowing the hash does not imply permission to fetch.
   515  - Saving network transfer by doing binary diff on blobs of the same name with different hashes/versions would be nice for upgrades.
   516  	*sabdfl* says we have this behaviour already with the phone images, and we should break this out into some common library somewhere somehow.
   517  
   518  ### Charms define their resources
   519  
   520  - `resources.yaml` (next to `metadata.yaml`)
   521  - Stream
   522  	- Has a description (free vs paid, beta etc/stable vs proposed)
   523  	- If you want to change the logic of a charm based on the blob stream, that is actually a different charm (or an if statement in your hooks)
   524  	- Streams will gain ACLs later (can you use the paid stream)
   525  	- Charms must declare a default stream
   526  - Filenames
   527  	- name of the blob
   528  - Version
   529  	- just a number (monotonically increasing number), version is stream dependent
   530  	- store has a pointer to the “current” version (which may not be the latest)
   531  - Architecture
   532  - Charm declares the shape of the resources it consumes (what files must be available). The store maintains the invariant that when the resource is updated, it contains the shape that the charm declared.
   533  - `charm-publish` uploads both the version of the charm and the version of the resources
   534  - we add a “current” pointer to charms like the resources, so that you have an opportunity to upload the charm and its resources and test it before it becomes the default charm that people get (instead of getting the ‘latest’ charm, you always get the ‘current’ unless explicitly specified)
   535  - mysql-13.paid.62
   536  
   537  ### Notes
   538  
   539  We need to cache fat charms on the bootstrap node. We need to "auto Kapil" fat charms. Sometimes we don't even have access to the outside network. We need one hop from the unit to the bootstrap
   540  
   541  However the important thing is that customers will probably fat charm everything, aka. huge IBM Websphere Java payload. 
   542  
   543  - Can Juju handle gigs of payload? Nate: Yes, moving away from the git storage. 
   544  - Is there anything core can do to make charms smaller?
   545  	- Marco: Yes
   546  	- Ben: We need a mechanism to specific common deps so that we can share them instead of having a copy in every charm. A bundle could have deps included, or maybe a common blob store?
   547  - juju-deployer is moving to core.
   548  
   549  If we move to image based workloads we can have a set image that included all the deps.
   550  
   551  Nate: We could do it so if we’re on a certain cloud we can install the deps as part of the cloud: aka. if I am on softlayer make sure IBM java is installed via cloud-init. So we can do things like an optimized image for big data. 
   552  
   553  ### Work Items
   554  
   555  1. Add optional format version to charm metadata (default 0) - 2
   556  	- Get juju to reject charms with formats it doesn’t know about ASAP
   557  1. Charm store needs to grow blob storage, with multiple streams, current resource pointers and links to the charm itself for the resources - 4
   558  1. Charm store needs to gain current charm revision pointers to charm - 2
   559  	- Juju should ask for current not latest
   560  1. The charm store needs to know which revisions of each resource stream each charm revision works with - 2
   561  1. Charm gains optional `resources.yaml` - 2
   562  	- Bump format version for those using `resources.yaml`
   563  1. Need to write a proper charm publish - 12
   564  	- Resource manifest match
   565  	- Salted hashes
   566  	- Partial diff up/down not in rev 1
   567  1. State server needs an HA charm/resources store - 8
   568  	- Should use same code as the actual charm store (shared lib/package)
   569  	- Replaces current charm storage in provider storage
   570  1. Charm does not exist in state until we have copied all authorized resources into the local charm store. - 2
   571  1. Uniter/charm.deployer needs to know about the resources file, parse content, know which stream, request resources from the local charm store, probably authenticated - 4
   572  	- Puts the resources into the resources directory as part of Deploy
   573  1. Bind mounting ensuring the links for the files flatten in the resources dir - 2
   574  
   575  ## Make writing providers easier
   576  
   577  ### Problems
   578  
   579  - Writing providers is hard.
   580  - Writing providers takes a long time .
   581  - Writing providers requires knowledge of internals of juju-core.
   582  - Providers suffer bitrot quite quickly.
   583  
   584  ### Agenda
   585  
   586  - Can we externalize from core? (plugins/other languages?)
   587  - Pre-made stub project with pluggable functions?
   588  - How to keep in sync with core changes and avoid bitrot?
   589  - How to insulate providers from changes to core?
   590  - Can we simplify the interface?
   591  - Complicating factor is config - can some be shared?
   592  - Need to design for reuse - factor out common logic.
   593  
   594  ### Notes
   595  
   596  - Keep `EnvironProvider`
   597  - Split `Environ` interface into smaller chunks 
   598  - E.g. `InstanceManagement`, `Firewall`
   599  - Smaller structs with common logic, e.g. port management what use provider specific call outs
   600  - Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct 
   601  - Above will allow necessary provider specific call outs to be identified
   602  
   603  ### Work Items
   604  
   605  1. Methods on provider operate on instance id
   606  1. Introduce bulk API calls
   607  1. Move instance addresses into environs/network
   608  1. Split `Environ` interface into smaller chunks; introduce `InstanceManager`, `Firewaller`
   609  1. Smaller structs with common logic, e.g. port management what use provider specific call outs
   610  1. Extract out Juju specific logic which is “duplicated” across providers and refactor into shared struct 
   611  1. Stop using many security groups - use default group with iptables
   612  1. Use `LoadBalancer`? interface (needed by Azure); will provide open/close ports; most providers will not need this and/or return no-ops
   613  1. `Firewaller` worker be made the sole process responsible for opening/closing ports on individual nodes
   614  1. Refactor provider’s use of `MachineConfig` as means to pass in params for cloud init; consider ssh’ing into pristine image to do work as per manual provider?????
   615  
   616  ## Availability Zones
   617  
   618  - Users want to be able to place units in availability zones explicitly (provider-specific placement directives). The core framework is nearing completion; providers need to implement provider-specific placement directives on top.
   619  - Users want highly-available services (Juju infrastructure and charms). On some clouds (Azure), spreading across zones is critical; on others it is just highly desirable.
   620  - Optional: one nice feature of the Azure Availability Set implementation is automatic IP load balancing (no need for HA Proxy which itself becomes a SPoF). Should we support this in other providers (AWS ELB, OpenStack LBaaS, ...)?
   621  
   622  ### Agenda
   623  
   624  - Prioritise implementation across providers (e.g. OpenStack > MaaS > EC2?).
   625  - Discuss overall HA story, IP load balancing.
   626  
   627  Azure supports implicit load balancing but don’t care about other clouds for now.
   628  
   629  ### Work Items
   630  
   631  1. Determine which providers support zones; EC2, OpenStack, Azure?
   632  1. Implement distribution group in all providers; either they do it or return an error.
   633  1. New policy in state which handles units on existing machines.
   634  1. New method on state which accepts distribution groups and list of candidate instance ids and returns a list of equal best candidates.
   635  1. Add API call to AMZ to find availability zones.
   636  
   637  ## Networks
   638  
   639  - Juju needs to be aware of existing cloud-specific networks, so it can make them available to the user (e.g. to specify placement and connectivity  requirements for services and machines, provide network capabilities for charms/relations, fine-tuning relations connectivity, etc.).
   640  - Juju needs to treat containers and machines in an uniform way with regards to networks and connectivity (e.g. providing and updating addresses for machines and containers, including when nesting).
   641  - Knowing the network topology and infrastructure in the cloud, juju can have a better model how services/machines interact and can provide user-facing tools to manage that model (CLI/API, constraints/placement directives, charm metadata) in on a high level, so that the user doesn’t need to know or care how lower level networking is configured.
   642  
   643  ### Agenda
   644  
   645  - Discuss and outline the high-level architecture integrating existing MaaS VLAN MVP work and instances addresses, so that we have a unified networking/addressability model.
   646  - Prioritize implementation across providers.
   647  - Discuss and define required features and deadlines?
   648  
   649  ### Meeting Notes
   650  
   651  - We need networks per service -> then configure them on machines.
   652  - Default networks get created (public/private)?
   653  - Networks per relation -> routing between netdb (mysql) /netapp (wp) e.g.
   654  - network relations to define routing ? add-net-relation netdb netapp; then when add-relation mysql wordpress [--using=netrel1] (if more than one)
   655  - Container addressability
   656  
   657  ## Networking - Connections vs Relations
   658  
   659  Discussion of specifics of networking routing.
   660  
   661  - Relations do not always imply connections (although usually they do, except when they don’t like with proxy charms).
   662  - Juju wants to model the physical connections to open ports/iptables/securitygroups/firewalls appropriately to allow the relation’s actual traffic.
   663  - We need to be able to specify the endpoints for communication within charm hooks if it’s not the default model. Possible hook commands for that:
   664  	- `enable-traffic endpoint_ip_address port_range`
   665  	- For example: `enable-traffic 194.123.45.6 1770-2000`
   666  	- `disable-traffic ep port_range`
   667  - Also talk to OpenStack charmers about non relation TCP traffic.
   668  - Should Juju model routing rules and tables for networks? (Directly via API/CLI or implicitly as part of other commands, like add-relation between services on different networks).
   669  
   670  ## Deployer into Juju Core
   671  
   672  - To embed the GUI we need a solid path for making bundles work.
   673  - You can’t juju deploy a bundle.
   674  - Moving towards stacks Core should support bundles like charms, provide apis to the files inside, etc. 
   675  - Can GUI use the ECS to replace the functionality of the deployer for GUI needs?
   676  
   677  The goal if the meeting is to verify that this is a logical path forward and create a plan to migrate to this. Stakeholders should agree on the needs in Core and make sure that it works with vs against future plans to expand on the idea of bundles into fat bundles and stacks. 
   678  
   679  ## Bundles to Stacks
   680  
   681  What’s needed to turn bundles into stacks?
   682  
   683  Bundles have no identity at run time, we want this for stacks. A namespace to identify the group of services that are under a bundle.
   684  
   685  Drag a bundle to the GUI, you get a bunch of services, with stacks, drag and drop a stack and you get one identifiable stack icon that itself is a composable entity and logical unit.
   686  
   687  - Namespaces
   688  	- The collection of deployed entities belong to a stack
   689  	- Bundles today ‘disappear’ once deployed (the services are available, but there is no visible difference from just doing the steps manually)
   690  - Exposed endpoints
   691  	- Interface “http” on the stack is actually “http” on internal Wordpress
   692  - Hierarchy (nesting)
   693  - Default “status” output shows the collapsed stack, explicitly describing the stack shows the internal details
   694  
   695  ### GUI concerns/thoughts
   696  
   697  - Expanded stack takes over the canvas, other items not shown
   698  - Drag on an “empty stack” which you can explode to edit, adding new services inside
   699  
   700  ### Notes
   701  
   702  - GUI can’t support bundles with local charms
   703  - Bundles should become core entity supported by juju-core
   704  - Deployer into juju-core should come after work for supporting uncommitted changes
   705  - (dry run option?)
   706  
   707  ### Stacks 2.0
   708  
   709  Further items about what a stack becomes
   710  
   711  - Incorporating Actions
   712  - Describing behavior for Add-Unit of a stack
   713  
   714  ### Work Items
   715  
   716  Spend time to make a concrete Spec for next steps
   717  for “namespacing” an initial implementation could just tag each item that is deployed with a name/UUID
   718  
   719  ## Charm Store 2.0
   720  
   721  - Access Control
   722  - Replacing Charm World
   723  - Ingesting Charms (for example w/ GitHub)
   724  - Ingesting Bundles
   725  - Search
   726  
   727  Kapil’s aim: simplify current model of charm handling. Break three way link between launchpad, charmworld (deals with bundles, used via api by the gui), and the charmstore (deals in charms, used by juju-core state server). Question: is breaking the link between launchpad and charmworld the first step?
   728  
   729  Lots of discussion over first steps, migrate charmworld api into store? Does the state server also need to implement it? Currently api is small but specific, search, pull specific file (maybe with some magic for icons) out of charms, some other things.
   730  
   731  **First step**: Add feed from store that advertises charms. Change charmworld ingest to read from the store feed rather than launchpad directly.
   732  
   733  **Second step**: Bundles are only in charmworld currently. Pulled from launchpad, are a branch with bundles.yaml, a readme, similar to a charm. Store needs to ingest bundles as well and also publish as a separate bundle feed. Change charmworld ingest to read store bundle feed.
   734  
   735  **Third step**: Add v4 api that supercedes current charmworld v3 api, implemented in store. Cleaning up direct file access and other odd things at the same time.  Remember that charm-tools are currently a consumer of v3 api.
   736  
   737  We may want to split charm store out of juju-core codebase, along with packages such as charm in core to separate libraries.
   738  
   739  After charmworld no longer talks to Launchpad it will be easier to provide ingestion from other sources, e.g. GitHub.  Publishing directly to the store will be possible also.
   740  
   741  Work item - bac - document existing charmworld API 3 (see [Charmworld API 3 Docs](http://charmworld.readthedocs.org/en/latest/api.html))
   742  
   743  We’ll need to be able to serve individual files out of charms:
   744  
   745  - `metadata.yaml`
   746  - `icon.svg`
   747  - `README`
   748  
   749  Search capability could be provided by Mongo 2.6 fulltext search?
   750  
   751  ### Questions
   752  
   753  - How does ingestion of charm store charms for personal names space?
   754  	- `juju deploy cs:gh`
   755  - Charm store 2.0 should be able to ingest not only from GitHub but from a specific branch in a GitHub repo (e.g. https://GitHub.com/charms/haproxy/tree/precise && https://GitHub.com/charms/haproxy/tree/trusty or a better example https://GitHub.com/charms/haproxy/tree/centos7)  This is needed when there needs to be  two different versions of a charm.
   756  	- As a best practice charms should endevour to have one charm per OS. When the divergence for a given charm is great enough (e.g. Ubuntu to CentOS) we should look at creating a new branch in git.
   757  
   758  ## ACLs for Charms and Blobs
   759  
   760  ### Work Items
   761  
   762  1. Namespace that holds revisions for a charm needs to store ACLs.
   763  1. Charm store needs to check them against API requests.
   764  1. The API to get the resource need to have a reference to the top-level charm. TBD. So we can check the read permission.
   765  
   766  Need to decide how we want to deal with access to metadata and content.
   767  Should we always allow full access to all blobs and content if you can deploy,
   768  
   769  ### Option #1
   770  
   771  r=metadata
   772  w=publish
   773  x=deploy
   774  
   775  #### Public charm (0755)
   776  
   777  |   |   | Metadata | Publish | Deploy |
   778  |------------|----------|----------|---------|--------|
   779  | maintainer | charmers | X | X | X |
   780  | installers | charmers | X |   | X |
   781  | everybody  | - | X |   | X |
   782  
   783  #### Charm under test (0750)
   784  
   785  |   |   | Metadata | Publish | Deploy |
   786  |------------|----------|----------|---------|--------|
   787  | maintainer | cmars | X | X | X |
   788  | installers | qa | X |   | X |
   789  | everybody  | - |   |   |   |
   790  
   791  #### Gated charm (0754)
   792  
   793  You can see it, but you have to get approval (added to installers).
   794  
   795  |   |   | Metadata | Publish | Deploy |
   796  |------------|----------|----------|---------|--------|
   797  | maintainer | ibm | X | X | X |
   798  | installers | ibm-customers | X |   | X |
   799  | everybody  | - | X |   |   |
   800  
   801  ### Option #2
   802  
   803  r=read content of charm
   804  w=publish
   805  x=deploy and read metadata
   806  
   807  #### Public charm (0755)
   808  
   809  |   |   | Content | Publish | Metadata & Deploy |
   810  |------------|----------|----------|---------|--------|
   811  | maintainer | charmers | X | X | X |
   812  | installers | charmers | X |   | X |
   813  | everybody  | - | X |   | X |
   814  
   815  #### Charm under test (0750)
   816  
   817  |   |   | Content | Publish | Metadata & Deploy |
   818  |------------|----------|----------|---------|--------|
   819  | maintainer | cmars | X | X | X |
   820  | installers | qa | X |   | X |
   821  | everybody  | - |   |   |   |
   822  
   823  #### Gated charm (0710)
   824  
   825  You can see it, but you have to get approval (added to installers).
   826  
   827  |   |   | Content | Publish | Metadata & Deploy |
   828  |------------|----------|----------|---------|--------|
   829  | maintainer | ibm | X | X | X |
   830  | installers | ibm-customers |   |   | X |
   831  | everybody  | - |   |   |   |
   832  
   833  #### Commercial charm with installer-inaccessable content (0711)
   834  
   835  |   |   | Content | Publish | Metadata & Deploy |
   836  |------------|----------|----------|---------|--------|
   837  | maintainer | ibm | X | X | X |
   838  | installers | ibm-customers |   |   | X |
   839  | everybody  | - |   |   | X |
   840  
   841  ## Upgrades
   842  
   843  Prior to 1.18, Juju did not really support upgrades. Each agent process listened to the agent-version global config value and restarted itself with a later version of its binary if required.
   844  
   845  1.18 introduced the concept of upgrade steps, which allowed for ordered execution of business logic to perform changes associated with upgrading from X to Y to Z. 1.18 also made the machine agents on each node solely responsible for initiating an upgrade on that node, rather than all agents (machine, unit) acting independently. However, several pieces are still missing….
   846   
   847  ### Agenda items
   848  
   849  - Coordination of node upgrades - lockstep upgrades
   850  - Schema updates to database
   851  - HA - What needs to be done to support upgrades in a HA environment?
   852  - Read only mode to prevent model or other changes during upgrades
   853  - How to validate an upgrade prior to committing to it, e.g. bring up shadow Juju environment on upgraded model and validate first before either committing or switching back?
   854  - Perhaps a `--dry-run` to show what would be done?
   855  - Authentication/authorization - restrict upgrades to privileged users?
   856  - How to deal with failed upgrades / rollbacks? Do we need application level transactions?
   857  - Testing of upgrades using dev release - faking reported version to allow upgrade steps to be run etc
   858  
   859  ### Work items for schema upgrade
   860  
   861  Key assumption - database upgrades complete quickly 
   862  
   863  1. Implement schema upgrade code (probably as an upgrade step).
   864  	- mgo supports loading documents into maps, so we do not have to maintain legacy structs.
   865  	- Record “schema” version.
   866  1. Implement state/mongo locking, with explicit upgrading/locked error.
   867  	- One form of locking is to just not allow external API connections until upgrade steps have completed, since we know we just restarted and dropped all connections.
   868  1. Introduce retry attempts in API server around state calls.
   869  1. Take copy of db prior to schema upgrade and copy back if it fails.
   870  1. Upgrade steps for master state server only.
   871  1. Coordination between master/slave state servers to allow master to finish first.
   872  
   873  ### Work items for upgrade story
   874  
   875  - Allow users to find out what version it will pick when upgrading.
   876  - Commands to report that upgrade is in progress if run during an upgrade.
   877  - Peer group worker to only start after an upgrade has completed.
   878  - Update machine status during upgrade, set error status on failure.
   879  
   880  ## Juju Integration with Oasis TOSCA standards (IBM)
   881  
   882  [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) is a standard aimed at, “Enhancing the portability and management of cloud applications and services across their lifecycle.” In discussions with IBM we need to integrate Juju into TOSCA standards as part of our agreement. Thus we need to define the following:
   883  
   884  - [TOSCA](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca) - simple profile yaml doc, updated approx weekly
   885  - Discuss who will lead this effort and engage with IBM.
   886  - Define the correct integration points.
   887  - Define design and architecture of TOSCA integration.
   888  - Define what squad will deliver the work and timelines.
   889  
   890  ### Goal
   891  
   892  - Drag a TOSCA spec onto the juju-gui and have the deployment happen.
   893  
   894  ## Other OS Workloads
   895  
   896  Juju has been Ubuntu only so far but never intended to be only Ubuntu. We were waiting for user demand. It seems some of that demand has now happened.  From earlier discussions the following areas have been identified for work:
   897  
   898  1. Remove assumptions about the presence of apt from core
   899  1. Deal with upstart vs systemv vs windows services init system differences for agents
   900  1. Deal with rsyslog configuration
   901  1. Define initial charms (bare minimum would be ubuntu charm equivalents)
   902  1. Update cloud-init stuff for alternate OS
   903  1. SSH Configuration
   904  1. Define and handle non Ubuntu images
   905  
   906  Key questions are:
   907  
   908  1. Which is going to be first
   909  	- expect the windows workloads as that has been implemented already and we just need to integrate
   910  1. How important is this compared to the other priorities?
   911  
   912  I don’t think there are any questions around “should we do it”, just “when should we do it”.
   913  
   914  ### CentOS / SLES
   915  
   916  Hopefully we can handle both CentOS and SLES at one go as they are based on very similar systems.  We may need to abstract out some parts, but on the whole, they *should* be very similar.  Again there should be a lot of overlap between Ubuntu and both CoreOS and SLES, with obvious differences in agent startup management and software installation. The writing of the actual charms are outside the scope of this work, although we should probably make CentOS and SLES charms to mirror the ubuntu charm that just bring up an appropriate machine.
   917  
   918  ### Windows
   919  
   920  We have work that has been done already by a third party to get Juju working to deploy windows workloads. It is expected that this work that is done will not either cleanly merge with current trunk, nor necessarily meet our normal demands of tests, robustness or code quality.  We won’t really know until we see the code.  However what it does give us is something that works that clearly identifies all of the Ubuntu specific parts of the codebase, and will give us a good foundation to work from to get the workload platform agnostic nature we desire.
   921  
   922  ### Notes
   923  
   924  - Need to get code drop from MS guys.
   925  - Use above to identify non Ubuntu specific parts of code.
   926  - We do interface design, CentOS implementation.
   927  - We hand the above back to MS guys and they use that as template to re-do the Windows version.
   928  - Excludes state server running on Windows.
   929  - Manual provisioning Windows instances.
   930  - Local provider (virtual box) on Windows.
   931  
   932  ## 3rd Party Provider Implementations
   933  
   934  - Improving our documentation around what it takes to implement a Provider.
   935  - We still call them Environ internally.
   936  
   937  ## Container Addressability (Network Worker)
   938  
   939  - [Earlier notes on Networking](https://docs.google.com/a/canonical.com/document/d/1Gu422BMAJDohIXqm6Vq4WTrtBV8hoFTTdXvXDQCs0Gs/edit)
   940  - Link to [Juju Networking Part 1](https://docs.google.com/a/canonical.com/document/d/1UzJosV7M3hjRaro3ot7iPXFF9jGe2Rym4lJkeO90-Uo/edit#heading=h.a92u8jdqcrto) early notes
   941  -What are the concrete steps towards getting containers addressable on clouds?
   942  -  Common
   943  	- Allocate an IP address for the container (provider specific).
   944  	- Change the NI that is being used to be bridged.
   945  	- Bring up the container on that bridged network and assign the local address.
   946  - EC2
   947  	- **ACTION(spike)**: How do we get IP addresses allocated in VPC?
   948  	- Anything left to be done in goamz?
   949  - OpenStack
   950  	- Neutron support in lp:goose.
   951  		- Add neutron package.
   952  		- Sane fallback when endpoints are not available in keystone (detect if Neutron endpoints are supported or not and if not report the error).
   953  		- New mock implementation (testservers).
   954  		- Specify ports/subnets at StartInstance time (possibly a spike as well).
   955  		- Add/remove subnets.
   956  		- Add/remove/associate ports (Neutron concept, similar to a NIC).
   957  		- Add/remove/relate bridges? Probably not needed for now.
   958  		- Maybe security groups via Neutron rather than Nova.
   959  	- Potential custom setup once port is attached on machine
   960  
   961  We need a Networker worker at the machine level to manage networks. What about public addresses? We want `juju expose` to grow some ability to manage public addresses. Need to be aware that there’s a limit of 5 elastic IPs per region per account. Can instead get a public address assigned on machine startup that cannot be freely reassociated. Need to make a choice about default VPC vs creating a VPC. Using only default VPC is simpler.
   962  
   963  ### Potentially out of scope for now
   964  
   965  - Using non-default VPC - requires several additional setup steps for routes and such like.
   966  - Networking on providers other than EC2/OpenStack, beyond making sure we don’t bork on interesting setups like Azure.
   967  - Networking on cloud deployments that do not support Neutron (e.g. HP).
   968  
   969  Separate discussion: Update ports model to include ranges and similar.
   970  
   971  Switching to new networking model also enables much more restrictive firewalling, but does require some charm changes. If charms start declaring ports exposed on a private networks, it would be possible to skip address-per-machine for non-clashing ports. Also allows more restrictive internal network rules.
   972  
   973  ### Rough Work Items
   974  
   975  1. When adding a container to an existing machine, Environment Provisioner requests a new IP address for the machine, and records that address as belonging to the container.
   976  1. `InstancePoller` needs to be updated, so that when it lists the addresses available for a machine, it is able to preserve the allocation of some addresses to the hosted containers.
   977  1. `Networker` worker needs to be able to set up bridging on the primary instance network interface, and do the necessary ebtables/iptables rules to use the same bridge for LXC containers (e.g. any container can use one of the host instance’s allocated secondary IP addresses so it appears like another instance on the same subnet).
   978  1. Existing MaaS cloudinit setup for VLANs will be moved inside the networker worker.
   979  1. Networker watches machine network interfaces and brings them up/down as needed (e.g. doing dynamically what MaaS VLAN cloudinit scripts do now and more).
   980  
   981  ## Leader Elections
   982  
   983  Some charms need to elect a “master” unit that coordinates activity on the service.   Also, Actions will at times need to be run only on the master unit of a service.  
   984  
   985  - How do we choose a leader?
   986  - How do we read/write who the leader is?
   987  - How do we recover if a leader fails?
   988  - The current leader can relinquish leadership (e.g. this is a round robin use case).
   989  
   990  Lease on leader status.  Allows caching, prevents isolated leader from performing bad actions.  If leader is running an action and can’t renew lease, must kill action.  Same with hooks that require leader.  Agent controls leader status, does the killing.
   991  
   992  ## Improving charm developer experience
   993  
   994  Charms are the most important part of Juju.  Without charms people want to use, Juju is useless. We need to make it as easy as possible for developers outside Canonical to write charms.
   995  
   996  Areas for improvement:
   997  
   998  - Make charm writing  easier.
   999  - Make testing easier.
  1000  - Make charm submission painless.
  1001  - Make charm maintenance easier.
  1002  - What are the current biggest pain points?
  1003  
  1004  ## Juju needs a distributed log file
  1005  
  1006  We currently are working on replicating rsyslog to all state servers when in HA.  Per Mark Ramm, this is good enough for now.  We may want to discuss  a real distributed logging framework to help with observability, maintenance, etc.
  1007  
  1008  ### Notes
  1009  
  1010  - Kapil says Logstash or Heka. Heka is bigger and more complicated, and suggests Logstash is more likely to be suitable.
  1011  - Wayne has used Apache Scribe in the past.
  1012  - Requirements:
  1013  	- Replicated (consistently) across all state servers.
  1014  	- Newly added state servers must have old log messages available.
  1015  	- Must be tolerant of state server failures.
  1016  	- Store and forward.
  1017  	- Nice to have: efficient querying.
  1018  	- Nice to have: surrounding tooling for visualization, post-hoc analysis, …
  1019  	- Encrypted log traffic.
  1020  
  1021  ### Actions
  1022  
  1023  Juju actions are charm-defined functionality that is user-initiated and take parameters  and are executed on units. Such as backing up mysql.
  1024  
  1025  ### Open Questions
  1026  
  1027  - How do we handle history and results?
  1028  - How do we handle actions that require leaders on services with no leaders?
  1029  - Is there anything else controversial in the spec?
  1030  - Do we have a piece of configuration on the action defining what states it's valid to run it in?
  1031  - Users should be made away of the lifecycle of an action. For example, what unit is currently backing up, the progress of the backup and the resolution of the backup if it was successful or not.
  1032  
  1033  Actions have
  1034  
  1035  1. State
  1036  1. Lifecycle
  1037  1. Reporting
  1038  
  1039  Actions accept parameters.
  1040  Actions directory at the top level: Contents a bunch of named executables.
  1041  `actions.yaml` has a key for each action.
  1042  E.g. service or unit action, e.g. schema for the parameters.  (JSON schema expressed in YAML).
  1043  
  1044  There are both unit-level and service-level actions.  Unit-level will be done first.
  1045  
  1046  Collections of requests and results.
  1047  Each unit watches the actions collection for actions targeted to itself.
  1048  Not notified of things they don't care about.
  1049  When you create an action, you get a token, you watch for the token in the results table.
  1050  Non-zero means failure.  Error return from an action doesn't put the unit into an error state.
  1051  
  1052  Actions need to work in more places than hooks.  We don't want to run them before start or after stop.  We want to run them while in an error state.
  1053  
  1054  ```
  1055  $ juju do action-name [unit-or-service-name] --config path/to/yaml.yml
  1056  ```
  1057  
  1058  By specifying a service name for a unit action, run against all units by default.
  1059  
  1060  Results are yaml.
  1061  
  1062  stdout -> log
  1063  
  1064  Hook and action queues are distinct
  1065  
  1066  ### Work Items
  1067  
  1068  1. Charm changes:
  1069  	- Actions directory (like hooks, named executables).
  1070  	- Top-level actions.yaml (top-level key is actions, sub-keys include parameters, description).
  1071  1. State / API server:
  1072  	- Add action request collection.
  1073  	- Add action result collection.
  1074  	- APIs for putting to action/result collections.
  1075  	- APIs for watching what request are relevant for a given unit.
  1076  	- APIs for watching results coming in (probably filtered by what unit/units we're interested in).
  1077  	- APIs for listing and getting individual results by token.
  1078  	- APIs for getting the next queued action.
  1079  1. Unit agent work:
  1080  	- Unit agent's "filter" must be extended to watch for relevant actions and deliver them to the uniter.
  1081  	- Various modes of the uniter need to watch that channel and invoke the actions.
  1082  	- Handwavy work around the hook context to make it capable of running actions and persisting results.
  1083  	- Hook tools:
  1084  		- Extract parameters from request.
  1085  		- Dump results back to database.
  1086  		- Error reporting.
  1087  		- Determine unit state?
  1088  1. CLI work:
  1089  	- CLI needs a way to watch for results.
  1090  	- juju do sync mode
  1091  	- juju do async mode
  1092  	- juju run becomes trivially implementable as an action
  1093  1. API for listing action history.
  1094  1. Leader should be able to run actions on its peers (use case: rolling upgrades).
  1095  1. Later: Fix up the schema for charm config to match actions.
  1096  
  1097  ## Actions, Triggers and Status
  1098  
  1099  What are triggers? (related to Actions, IIRC)
  1100  
  1101  ### Potential applications
  1102  
  1103  - Less polling for UI, deployer, etc.
  1104  
  1105  ### Topics to discuss
  1106  
  1107  - Authentication
  1108  - Filtering & other features
  1109  - API
  1110  - Implementation
  1111  
  1112  ## Combine Unit agent and Machine agent into a single process
  1113  
  1114  - What is the expected benefit?
  1115  	- Less moving parts, machine and unit agents upgrade at the same time.
  1116  	- Avoids N unit agents for N charms + subordinates (when hulk-smashing for example).
  1117  	- Less deployment size footprint (one less jujud binary).
  1118  	- Less workers to run, less API connections.
  1119  - What is the expected cost?
  1120  	- rsyslog tagging (logs from the UA arrive with the agent’s tag; we need to keep that for observability).
  1121  	- Concrete steps to make the changes.
  1122  
  1123  Issues with image based deployments?
  1124  
  1125  - No issues expected.
  1126  - Even we need a juju component inside the container, no issue.
  1127  
  1128  ### Work Items
  1129  
  1130  1. Move relevant unit agent jobs into machine agent (drop duplicates).
  1131  1. Remove redundant upgrade code.
  1132  1. Change deployer to start new uniter worker inside single agent.
  1133  1. Change logging (loggo/rsyslog worker) to allow tags to be specified when logging so that each unit still logs with its own tag.
  1134  1. (Eventually) consolidate previously separate unit/machine agent directories into single dir.
  1135  1. ensure juju-run works as before
  1136  
  1137  ## Backup/Restore
  1138  
  1139  - Making current state work:
  1140  	- We need to have the mongo client for restore.
  1141  	- We need to ignore replicaset.
  1142  - What will it take to implement a “proper” backup, instead of just having some scripts that mostly seemed to work one time.
  1143  	- Back-up is an API call
  1144  	- Restore should grow in `jujud`.
  1145  		- Add a restore to the level of bootstrap?
  1146  - Turning our existing juju-backup plugin from being a plugin into being integrated core functionality.
  1147  	- Can we snapshot the database without stopping it?
  1148  	- How will this interact with HA? We should be able to ask a secondary to save the data.
  1149  	- It is possible to mongodump a running process, did we consider that rather than shutting mongo down each time?
  1150  	- Since we now always use --replicaSet even when we have only 1, what if we just always created a “for-backup” replica that exists on machine-0. Potentially brought up on demand, brought up to date, and then used for backup sync.
  1151  - juju-restore
  1152  	- What are the assumptions we can reliably make about the system under restore?
  1153  		- E.g., in theory we can assume all members of the replica are dead, otherwise you wouldn’t be using restore, you would just be calling enusre-availability again.
  1154  	- Can we spec out what could be done if the backup is “old” relative to the current environment? Likely most of this is “restore 3.0” but we could at least consider how to get agents to register their information with a new master.
  1155  
  1156  ### Concrete Work Items
  1157  
  1158  1. Backup as a new Facade for client operations.
  1159  1. `Backup.Backup` as an API call which does the backup and stages the backup content on server disk. API returns a URL that can be used to fetch the actual content.
  1160  1. `Backup.ListBackups` to get the list of tarballs on disk.
  1161  1. `Backup.DeleteBackups` to clean out a list of tarballs.
  1162  1. HTTP Mux for fetching backup content.
  1163  1. Juju CLI for
  1164  	- `juju backup` (request a backup, fetch the backup locally)
  1165  
  1166  ## Consumer relation hooks run before provider relation hooks
  1167  
  1168  [Bug 1300187](https://bugs.launchpad.net/juju-core/+bug/1300187)
  1169  
  1170  - IIRC, William had a patch which made the code prefer to run the provider side of hooks first, but did not actually enforce it strictly. Does that help, or are charms still going to need to do all the same work.
  1171  - Does it at least raise the frequency with which charms “Just Work” or does it make it hard to diagnose when they “Just Fail”.
  1172  
  1173  ## Using Cloud Metadata to describe Instance Types
  1174  
  1175  We currently hard-code EC2 instance types in big maps inside of juju-core. When EC2 changes prices, or introduces a new type, we have to recompile juju-core to support it. Instead, we should be able to read the information from some other source (such as published on streams.canonical.com since AMZ doesn’t seem to publish easily consumable data).
  1176  
  1177  - OpenStack provider already reads the data out of keystone, are we sure AMZ doesn’t provide this somewhere.
  1178  - Define a URL that we could read, and a process for keeping it updated.
  1179  
  1180  ### Work Items
  1181  
  1182  1. Investigate the instance type information each cloud type has available - both programmatically and elsewhere.
  1183  1. Define abstraction for retrieving this information. Some clouds will offer this information directly, others will need to get this from simplestreams. Some cloud types may involve getting the information from mixed sources.
  1184  1. Support search path for locating instance information and mixed sources.
  1185  1. Ensure process for updating Canonical hosted information is in place.
  1186  1. Document how to update instance type information for all cloud types.
  1187  1. API for listing instance types (for GUI).
  1188  
  1189  ## API Versioning
  1190  
  1191  We’ve wanted to add this for a long time.
  1192  
  1193  - Possible [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit#heading=h.avfqvqaaprn0) for refactoring API into many Facades
  1194  - [14.04 Spec](https://docs.google.com/a/canonical.com/document/d/12SFO23hkx4sTD8he61Y47_kBJ3H5bF2KOwrFFU_Os9M/edit)
  1195  - Can we do it and remain 2.x compatible for the lifetime of Trusty?
  1196  - Concrete design around what it will look like.
  1197  	- From an APIServer perspective (how do we expose multiple versions).
  1198  	- From an API Client perspective.
  1199  	- From the Juju code itself (how does it notice it wants version X but can only get Y so it needs to go into compatibility mode, is this fine grained on a single API call, or is this coarse grained around the whole API, or middle ground of a Facade).
  1200  
  1201  ### Discussion
  1202  
  1203  - We can use the string we pass in now ("") to each Facade, and start passing in a version number.
  1204  - Login can return the list of known Facades and what version ranges are supported for each Facade.
  1205  - Login could also start returning the environment UUID that you are currently connected to.
  1206  - With that information, each client-side Facade tracks the best version it can use, which it then passes into all `Call()` methods.
  1207  - Compatibility code uses `Facade.CurrentVersion()` to do an if/then/switch based on active version and do whatever compatibility code is necessary.
  1208  
  1209  ### Alternatives
  1210  
  1211  - Login doesn’t return the versions, but instead when you do a `Call(Facade, VX)` it can return an error that indicates what actual versions are available.
  1212  	- Avoids changing  Login.
  1213  	- Adds a round-trip whenever you are actually in compatibility mode.
  1214  	- Creates clumsy code around: `if Facade.Version < X { do compat} else { err : =tryLatest; if err == IsTooOld {compat}}`
  1215  - Login sets a global version for all facades.
  1216  	- Seems a bit to coarse grained that any change to any api requires a global version bump (version number churn).
  1217  - Each actual API is individually versioned.
  1218  	- Seems to fine grained, and makes it difficult to figure out what version needs to be passed when (and then deciding when you need to go into compat mode).
  1219  
  1220  ## Tech-debt around creating new api clients from Facades
  1221  
  1222  [Bug 1300637](https://bugs.launchpad.net/juju-core/+bug/1300637)
  1223  
  1224  - Server side [spec](https://docs.google.com/a/canonical.com/document/d/1guHaRMcEjin5S2hfQYS22e22dgzoI3ka24lDJOTDTAk/edit).
  1225  - We talked about wanting to split up Client into multiple Facades. How do we get there, what does the client-side code look like
  1226  - We originally had just `NewAPIClientFromName`, and Client was a giant Facade with all functions available
  1227  - We tried to break up the one-big-facade into a few smaller ones that would let us cluster functionality and make it clearer what things belonged together. (`NewKeyManagerClient`).
  1228  - There was pushback on the proliferation of lots of New*Client functions. One option is that everything starts from `NewAPIClientFromName()`, which then gets a `NewKeyManager(apiclient)`. 
  1229  
  1230  ## Cross Environment Relations
  1231  
  1232  We’ve talked a few times about the desirability of being able to reason about a service that is “over there”, managed in some other environment.
  1233  
  1234  - Last [spec](https://docs.google.com/a/canonical.com/document/d/1PpaYWvVwdF55-pvamGwGP23_vHrmFwCW8Bi-4VUg-u4/edit)
  1235  	- Describes the use cases, confirm that they are still valid.
  1236  - We should update to include the actual user-level commands that would be executed and what artifacts we would expect (e.g., `juju expose-service-relation` creates a `.jenv/.dat/.???` that can be used with `juju add-relation --from XXX.dat`).
  1237  
  1238  ### Notes
  1239  
  1240  Expose endpoint in env 1, this generates a jenv (authentication info for env1) that you can import-endpoint into another environment.  This has env2 connects to env1, asks for information about the service in env1.  This creates a ghost service in env2 that exposes a single endpoint, which is only available for connecting relations (no config editing etc).  There is a continuous connection between the two environments to watch whether the service goes down, etc.  
  1241  Propagate IP changes to other environment.  Note that it is currently broken for relations even in a single environment.
  1242  Cross environment relations always use public addresses (at least to start).
  1243  Note that the ghost service name may be the same as an existing service name, and we have to ensure that’s ok.
  1244  
  1245  ## Identity & Role-Based Access Controls
  1246  
  1247  - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm)
  1248  - [Establishing User Identity](https://docs.google.com/a/canonical.com/document/d/150GEG_mDnWf6QTMc1kBvw_x_Y_whGVN19mr3Ocv6ELg/edit#heading=h.aza0s6fmxfs9)
  1249  
  1250  ### Current Status
  1251  
  1252  - Concept of service ownership in core.
  1253  - Add/remove user, add-environment framework done, not exposed in CLI.
  1254  
  1255  What does a minimum viable multi-user Juju look like? (Just in terms of ownership, not ACLs).
  1256  
  1257  - `add-user`
  1258  - `remove-user`
  1259  - `add-environment`
  1260  - `whoami`
  1261  
  1262  ### 14.07 (3mo)
  1263  
  1264  - Beginnings of role-based access controls on users (Implementation of RBAC in core is another topic).
  1265  - [Juju Identity, Roles & Permissions](https://docs.google.com/a/canonical.com/document/d/138qGujBr5MdxzdrBoNbvYekkZkKuA3DmHurRVgbTxYw/edit#heading=h.7dwo7p4tb3gm).
  1266  - Non-superusers: read-only access at a minimum.
  1267  
  1268  ### 14.10 (6mo)
  1269  
  1270  - Command-line & GUI identity provider integrations.
  1271  
  1272  ### 15.01 (9mo)
  1273  
  1274  - IaaS, mutually-trusted identities across enterprises.
  1275  - Need a way to securely broker B2B IaaS-like transactions.
  1276  
  1277  ## Iron Clad Test Suite
  1278  
  1279  The Juju unit test suite is beset by intermittent failures, caused by a number of issues:
  1280  
  1281  - Mongo and/or replica set related races.
  1282  - Access to external URLs e.g. charm store.
  1283  - Isolation issues such that one failure cascades to cause other tests to fail.
  1284  
  1285  There are also other systemic implementation issues which cause fragility, code duplication, and maintainability problems:
  1286  
  1287  - Lack of fixtures to set up tools and metadata (possibly charms?).
  1288  - Code duplication due to lack of fixtures.
  1289  - Issues with defining tools/version series such that tests and/or Juju itself can fail when run on Ubuntu with different series.
  1290  
  1291  Related but not a reliability issue is the speed at which the tests run e.g. the Joyent tests take up to 10 minutes. We also have tests which were set up to run against live cloud deployments but which in practice are never run - we now rely on CI. 
  1292  
  1293  Over the last cycle, things have improved, and there are certain issues external to Juju (like mongo) which contribute to the problems. But we are not there yet and must absolutely get to the stage where tests pass first time, every time on the bot and when run locally. We need to consider/discuss/agree on:
  1294  
  1295  - Identify current failure modes.
  1296  - Harden test suite to deal with external failures, fix juju-core issues.
  1297  - Introduce fixtures for things like tools and metadata setup and refactor duplicate code and set up.
  1298  - Document fixtures and other test best practices.
  1299  
  1300  ### Work Items - Core - Refactoring and Hardening
  1301  
  1302  Juju does what it is supposed to do, but has a number of rough edges when it comes to various non-functional requirements which contribute to the fact that often Juju doesn’t Just Work, and many times requires an unacceptably high level of user expertise to get things right. These non-functional issues can very broadly be classified as:
  1303  
  1304  - **Robustness** - Juju needs to get better at dealing with underlying issues, whether transient network related, provider/cloud related, or user input. 
  1305  - **Observability** - Juju needs to be less of a black box, and expose more of what’s going on under the covers, so that both humans and machine alike can make informed decisions in response to errors and system status.
  1306  - **Usability** - Juju needs to provide a UI and workflow that makes it difficult to make mistakes in the first place; to catch and report errors early as close to the source as possible.
  1307  
  1308  As well as changes to the code itself, we should consider process changes which will guide how new features are implemented and rolled out. There is currently a disconnect between developers and users (real world). A developer will often test a new feature in isolation on a single cloud which works first time, deployed on an environment with a few.
  1309  
  1310  - Rename `LoggingSuite` to something else, make the default base suite with mocked out `$HOME`, etc.
  1311  	- Identify independent fixtures (e.g. fake home, fake networking, …), and compose base suite from them.
  1312  	- Create fake networking fixture that replaces the default HTTP client with something that rejects attempts to connect to non-localhost addresses.
  1313  	- Update tools fixture and related tests.
  1314  - Introduce in-memory mock mgo for testing independent of real mongo server.
  1315  - Continue separation of api/apiserver in unit tests to enable better error checking.
  1316  - Document current testing practices to avoid cargo culting of old practices, ensure document is kept up-to-date at code review time.
  1317  - Update, speed-up Joyent tests (and all tests in general). Joyent tests currently take ~10mins, which far too long.
  1318  - Suppress detailed simplestreams logging by default in (new) ToolsSuite by setting streams package logging level to INFO suite setup.
  1319  - Delete live tests from juju-core.
  1320  
  1321  Nodes at best. They won’t be exposed to the pain associated with / needed to diagnose and rectify faults etc since it’s often easier to destroy-environment and start again, or a new revision will have landed and CI will start all over again. More often than not, it’s the QA team who has to diagnose CI failures which are raised as bugs but with developers being spared the pain of the root cause analysis and any fixes often addressing a specific bug rather than a systemic, underlying issue.
  1322  
  1323  ### Items to consider
  1324  
  1325  - Architectural layers - what class of error should each layer handle and how should errors be propagated / handled upwards.
  1326  - How to expose/wrap provider specific knowledge to core infrastructure so that such knowledge can be used to advantage?
  1327  - Where’s the line between Juju responding to issues encountered vs informing and immediate feedback of problems but CI issues lack immediate visibility.
  1328  - Close the loop between real world deployment and developers.
  1329  - How to ensure teams take ownership of non-functional issues?
  1330  - Tooling - targeted inspection of errors, decisions made by Juju, e.g. utilities exist to print where tools/image metadata comes from; is that sufficient, what else is needed?
  1331  - Roadmap would be awesome to know what features to look for in upcoming releases (and waiting for user input.
  1332  - Feature development - involve stakeholders/users (CTS?) more, at prototype stage and during functional testing?
  1333  - Hhow best to expose developers to real world, so that necessary hardening work becomes as much of an itch scratch as it does a development chore.
  1334  -C close the loop between CI and development - unit tests / landing bot provide flag specific features for additional functional testing).
  1335  
  1336  ### Notes
  1337  
  1338  - Mock workflow in a spec/doc, quick few paragraphs about a change or feature will look for a user facing standpoint.
  1339  - Not all features require functional / UAT because of time constraints but still want to give CTS etc input to dev.
  1340  - Wishlist: Send more developers out on customer sites to get real world experiences.
  1341  - Much more involvement with IS as a customer.
  1342  - More core devs need to write charms.
  1343  - Debug log too spammy - but new incl/excl filters may help.
  1344  - Debug hooks used a lot - considered powerful tool.
  1345  - Debug hooks should be able to drop a user into a hook context when not in error state, e.g.  `juju debug hooks unit/0 config-changed`.
  1346  - Need more output in status to expose internals (Is my environment idle or busy?).
  1347  - More immediate reporting to user of charm output as deploy happens, don’t want to wait 15 minutes to see final status.
  1348  - Juju diagnose - post mortem tools <- already done via juju ready/unready, output vars etc
  1349  
  1350  ### Work Items
  1351  
  1352  [Juju Fixes](https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoQnpJ43nBkJdHhnV05NcmQ3Tm5yRnIwcTlYMTZEaEE&usp=sharing)
  1353  
  1354  1. Design error propagation mechanism to be used across providers.
  1355  1. Destroy Service --Force.
  1356  1. Dry run tell user what version upgrade-juju will use.
  1357  1. Inspect Relation Data.
  1358  1. Address changes must propagate to relations.
  1359  1. Use Security Group Per Service.
  1360  1. Use Instance names/tags for machines.
  1361  1. safe-provisioning-mode default.
  1362  1. Bulk machine creation.
  1363  1. Unit Ids must be unique.
  1364  
  1365  ## Retry on API Failures
  1366  
  1367  Really part of hardening. There are transient provider failures due to issues like exceeding allowable API invocation rate limits. Currently Juju will fail when such errors are encountered and consider the errors permanent, when it could retry and be successful next time. The OpenStack provider does this to a limited extent. A large part of the problem is that Juju is chatty and makes many individual API calls to the cloud. We currently have a facility to allow provisioning to be manually retried but need something more universal and automated.
  1368  
  1369  ### Discussion Points
  1370  
  1371  - Understanding what types of operation can produce transient errors. is it the same for all providers? what extra information is available to help with retry decision?
  1372  - Common error class to encapsulate transient errors.
  1373  - Algorithm to back off and retry.
  1374  - To what extent can Juju design / implementation change to mitigate most common cause which is exceeding rate limits.
  1375  - How to report / display retry status.
  1376  - Manual intervention still required?
  1377  
  1378  ### Work Items
  1379  
  1380  1. Identify for for each provider which errors can be retried.
  1381  1. Juju should handle retries.
  1382  1. Above discussion points constitute the other work items.
  1383  1. Audit juju to identify api optimisation opportunities.
  1384  
  1385  ## Audit logs in Juju-core
  1386  
  1387  The GUI needs to be able to query *something* for a persistent log of changes in the environment.
  1388  
  1389  - What events are auditable ? hatch: only events that cause changes in the environment.
  1390  - tTm: who changed something, what was changed, when was it changed, what was it changed from, and to, why they were allowed to do it (Will).
  1391  - Hatch: it needs to be structured events, user, event, description, etc, NOT just a blob of text.
  1392  - Voidspace: do we need a query api on top of this ? filter by machine, by user, by operation, etc
  1393  - Audit log entries are not protected at a per row level. Viewing the audit log will require a specific permission.
  1394  - Not all users of the GUI may be able to access the audit log.
  1395  - Audit log entries may be truncated, truncation will require a high level of permissions.
  1396  - ACTION: determine auditable events.
  1397  - ACTION: determine where to store this data, and what events to audit.
  1398  - Hatch: it doesn’t need to be streaming from the start, but it should be possible.
  1399  
  1400  ### Work Items
  1401  
  1402  1. Create a state API for writing to the audit log (in mongodb).
  1403  1. Record attempt before API request is run.
  1404  1. Record success/error after API request is run.
  1405  
  1406  ## Staging uncommited changes
  1407  
  1408  Hatch doesn’t want to do this in Javascript, because it is not web scale. He wants the API server to handling this staging.
  1409  
  1410  Thumper says that SABDFL says they want to be able to do this on the CLI as well.
  1411  
  1412  - Nate: if we need to allow this to work across GUI and CLI then we have to store this data in the state.
  1413  - Nate: do we need N staging areas per environment ? Nate: No, that is crazy talk, just one per environment.
  1414  - Thumper: then we’ll need a watcher.
  1415  - ACTION: uncommitted changes are stored in the state as a single document, a big json blob
  1416  - ACTION: we need a watcher on this document.
  1417  - Voidspace: entries are appended to this document, this could lead to confusion if people are concurrently requesting unstaged changes.
  1418  - Hazmat doesn’t think we should store this in the state.
  1419  - ACTION: Mark Ramm/hazmat to talk to SABDFL about the difficulty of implementing this.
  1420  - All: do we have to have a lock or mode to enable/disable staging mode ?
  1421  - Hatch: now the GUI and the CLI have different stories, the former works in staging mode by default, and the latter always commits changes immediately.
  1422  - ACTION: a change via the CLI would error if there are pending changes, you can the push changes into the log of work with a --stage flag. Ramm: alternative, we tell the customer that the change has been staged, and they will need to ‘commit’ changes.
  1423  - ACTION: the CLI needs a ‘commit’ subcommand.
  1424  - Undo is out of scope, but permissible in a future scope; tread carefully.
  1425  
  1426  ### Discussion Thurs May 1
  1427  
  1428  - Moved into the idea of having an ApplyDelta API that lets you build up a bunch of actions to be changed
  1429  - These actions can then all be in pending state, and you do a final call to apply them.
  1430  - The actual internal record of the actions to apply is actually a graph based on dependencies
  1431  - This lets you “pick one” to apply without applying the rest of the delta
  1432  - Internally, we would change the current API to act via “create delta, apply delta” operations.
  1433  - When a delta is pending, calling the current API could act on the fact that there are pending operations.
  1434  - Spelling is undefined, e.g.
  1435  	- `named := CreateDelta()`
  1436  	- `AddToDelta(named, operation)`
  1437  	- `ApplyDelta(named)`
  1438  	- `ApplyDelta(operations)`
  1439  - If it is just the ability to apply a listed set of operations, we haven’t actually exposed a way to collaborate on defining those operations.
  1440  
  1441  ## Observability
  1442  
  1443  How to expose more of what Juju is doing to allow users to make informed decisions. Key interface point via `juju status`. Consider instance / unit,  observability and transparency. e.g.. what does pending really mean? Is it still in provisioning at the provider layer, is machine agent running? Is the install hook running? Is the start hook running? We collapse all of that done to a single state. we should ideally just push the currently executing hook into status.
  1444  
  1445  ### To discuss
  1446  
  1447  - How to display error condition concisely but allowing for more information if required.
  1448  - Insight into logs - is debug log enough? (now has filtering etc).
  1449  - Feedback when running commands via CLI - often warnings are logged server side, how to expose to users; use of separate back channel?
  1450  - Interactive commands? Get input to continue or try again or error/warning?
  1451  - Consistency in logging - guidelines for verbosity levels, logging API calls etc
  1452  - How to discover valid vocabularies for machine names, instance types etc?
  1453  - How to inspect relation data?
  1454  - Should output variables be recorded/logged?
  1455  - Provide --dry-run option to see what Juju would do on upgrades.
  1456  - Better insight into hook firing.
  1457  - Ability to probe charms for health? (incl e.g. low disk space etc).
  1458  - Event driven feedback.
  1459  - Integration with SNMP systems? How to alert when issues arise?
  1460  
  1461  ### Work Items
  1462  
  1463  - `juju status <entity>` reveals more about that entity - get all output on context that is specified.
  1464  - Add new unit state - healthy/unhealthy.
  1465  - Instance names/tag for machines (workload that caused it deployed).
  1466  - Specifically, when deploying a service or adding a unit that requires a machine to be added, the provisioner should be passed through a tag of the service name or similar to annotate the machine with on creation.
  1467  - Inspect relation data.
  1468  - implement output variables (needs spec).
  1469  - `add-machine`, `add-unit` etc need to report what was added etc
  1470  - API for vocab (inst type).
  1471  
  1472  ## Usability
  1473  
  1474  ### Covers a number of key points
  1475  
  1476  - Discoverable - features should be easily discoverable via `juju help` etc.
  1477  - Validate inputs - Juju should not accept input that causes breakage, and should fail early.
  1478  - Error response - Juju should report errors with enough information to allow the user to determine the cause, and ideally should suggest a solution.
  1479  - Key workflows should be coherent and concise.
  1480  - Tooling / API support for key workflows.
  1481  
  1482  ### Agenda
  1483  
  1484  - Identify key points of interaction - bootstrap, service deployment etc.
  1485  - Current pain points e.g.
  1486  	- Tools packaging for bootstrap for dev versions or private clouds?
  1487  	- Open close port range?
  1488  	- Security groups!
  1489  	- What else?
  1490  - What’s missing? Tooling? The right APIs? Documentation? Training?
  1491  - Frequency of pain points vs impact.
  1492  
  1493  ### Concrete Work Items
  1494  
  1495  1. Improve `juju help` to provide pointers to extra commands.
  1496  1. Transactional config changes.
  1497  1. Fix destroy bug (destroy must be run several times to work).
  1498  	- Find or file bug on lp
  1499  1. When a machine fails, machine state in juju status displays error status with error reason.
  1500  1. Document rationale in code comment.
  1501  1. `juju destroy service --force`
  1502  1. Range syntax for open/close ports.
  1503  1. Safe mode provisioning  becomes default.
  1504  1. Garbage collect security groups.
  1505  
  1506  ## Separation of business objects from persistence model
  1507  
  1508  A widely accepted architectural model for service oriented applications has layers for:
  1509  
  1510  - services
  1511  - domain model
  1512  - persistence
  1513  
  1514  The domain model has entities which encapsulate the state of the key business abstractions e.g. service, unit, machine, charm etc. This is runtime state. The persistence layer models how entities from the domain model are save/retried to/from non-volatile storage - mongo, postgres etc. The persistence layer translates business concepts like queries and state representation to storage specific concepts. This separation is important in order to provide database independence but more importantly to stop layering violations and promote correct design and separations of concerns.
  1515  
  1516  ### To discuss
  1517  
  1518  - Break up of state package.
  1519  - How to define and model business queries.
  1520  - How to implement translation of domain <> persistence model.
  1521  
  1522  ### Goals
  1523  
  1524  - No mongo in business objects - database agnosticism.
  1525  - Remove layering violations which lead to suboptimal model design.
  1526  - Scalability via ability to implement pub/sub infrastructure on top of business model rather than persistence model; no more suckiSpng on mongo firehose.
  1527  
  1528  ### Work Items
  1529  
  1530  1. Spike to refactor a subset of the domain model (e.g. machines). 
  1531  1. Define and use patterns (e.g. “named query”) to abstract out database access further (in spike).
  1532  1. Define and use patterns for mapping/transforming domain objects to persistence model.
  1533  1. If possible, define and implement integration with pub/sub for change notification.
  1534  
  1535  ## Juju Adoption Blockers
  1536  
  1537  [Slides with talking points](https://docs.google.com/a/canonical.com/presentation/d/1jcJ93Npuo60Iyy0BGSNap1kekQNxiZ7rDBJfuxAv_Go/edit#slide=id.ge4adadaf_1_645)
  1538  
  1539  ## Partnerships and Customer Engagement
  1540  
  1541  - Juju GUI has been a tremendous help.
  1542  	- Sales team enabler, to quickly and easily show Juju.
  1543  - Every customer/partner asks
  1544  	- Where can I get a list of all charms?
  1545  	- Where can I get a list of all available relations?
  1546  	- Where can I get a list of all available bundles?
  1547  	- Where can I get a list of all supported cloud providers?
  1548  	- What about HA?  What happens if the bootstrap node goes away?
  1549  		- We need to start demonstrating this, ASAP!
  1550  	- What if one of the connected services goes away?  What does Juju do?
  1551  	- So, great, I can use Juju to relate Nagios and monitor my service.  But what does Juju do with that information?  Can’t Juju tell if a service disappears?
  1552  	- Auto-scaling?  Built in scalability is great, but manually increasing units is only so valuable.
  1553  	- What do you mean, there aren’t charms available for 14.04 LTS yet?
  1554  	- *Yada yada yada* Docker *yada yada yada*?
  1555  - Our attempts to shift the burden of writing charms onto partners/customers have yielded minimal results.
  1556  - Pivotal/Altoros around CloudFoundry
  1557  	- CloudFoundry is so complicated, Pivotal developed their own custom Juju-like tool (BOSH) to deploy it, and their own “artifact” based alternative to traditional Debian/Ubuntu packaging.
  1558  	- CloudFoundry charms (and bundles) have proven a bit too complex for newbie/novice charmers at Altoros to develop, at the pace and quality we require.
  1559  
  1560  ## Juju 2.0 Config
  1561  
  1562  - Define providers and accounts as a first class citizen.
  1563  - Eventually remove environments.yaml in favor of the above account configuration and .jenv files
  1564  - Change ‘juju bootstrap’ to take an account and --config=file/--option=”foo=var” for additional options.
  1565  - `juju.conf` needs
  1566  	- simplestreams source for provider definitions, defaulting to https://streams.canonical.com/juju/providers.
  1567  		- A new stream type “providers” containing the environment descriptions for known clouds (e.g. hpcloud has auth_url:xyz, type:OpenStack, regions-available: a,b,c, default-region:a).
  1568  		- Juju itself no longer includes the information inside the ‘juju’ binary, but depends on that information from elsewhere.
  1569  	- Providers section.
  1570  		- Locally define the data that would otherwise come from above.
  1571  	- Accounts section.
  1572  		- Each account references a single provider.
  1573  		- Local overrides for environment details (overriding defaults set in provider).
  1574  
  1575  ## Distributing juju-core in Ubuntu
  1576  
  1577  Landscape has a stable release exception for their client, not a micro release exception. We fulfil the rules for this even better than landscape does, as we have basically no dependencies at all.
  1578  
  1579  We can split juju the client from jujud the server, though this isn’t terribly useful for us outside of making distro people happy.
  1580  
  1581  Landscape process has two reviews before code lands, we used to do this but changed. Didn’t seem to drop quality our end.
  1582  
  1583  Could raise at a tech board meeting item to sort out stable release things.
  1584  
  1585  Having to have separate source packages for client and server would be annoying but painful, could we have different policies for binary packages generated from the same source package?
  1586  
  1587  Dynamic linking gripes are not imminently going to be solved by anyone.
  1588  
  1589  Have meeting with foundations to resolve some unhappinesses.
  1590  
  1591  ## Developer Documentation
  1592  
  1593  - https://juju.ubuntu.com/dev/ - Developer Documentation.
  1594  - There exists an automated process to pull the files from the doc directory in the juju-core source tree and process the markdown into html, and uploads it into the WordPress site.
  1595  - Minimal topics needed
  1596  	- Architecture overview
  1597  	- API overview
  1598  	- Writing new API calls
  1599  	- What is in state (our persistent store - horrible name, I know)?
  1600  	- How the mgo transactions work?
  1601  	- How to write tests?
  1602  		- Base suites
  1603  		- Environment isolation
  1604  		- Patch variables and environment
  1605  		- Using gocheck (filter and verbose)
  1606  		- Table based tests vs. simple tests
  1607  		- Test should be small and obviously correct
  1608  	- Developer environment setup
  1609  	- How to run the tests?
  1610  	- `juju test <filter> --no-log (plugin)`
  1611  - https://juju.ubuntu.com/install/ should say install juju-local
  1612  
  1613  ## Tools, where are they stored, sync-tools vs bootstrap --source
  1614  
  1615  - FindTools is called whenever tools are required, which searches all tools sources again.
  1616  - When tools are located in the search path, they are copied to env storage and accessed from there when needed.
  1617  - Find is only to be called once at well defined points : bootstrap and upgrade. the tools are fetched into env storage so that e.g. during upgrade tools are sourced from there.
  1618  - Need tools catalog separate from simplestreams for locating tools in env storage.
  1619  - Bootstrap and upgrade and sync-tools need --source.
  1620  
  1621  As is the case now, if --source is not specified, an implicit upload-tools will be done.
  1622  
  1623  ## Status - Summary vs Detailed
  1624  
  1625  Status is spammy even on smallish environments.  It’s completely unusable on mid sized and larger environments. Can we make it easier to read, or make another status that is more of a summary view?
  1626  
  1627  ### Work Items
  1628  
  1629  1. Identify items in status output that may break people’s scripts if changed or removed.
  1630  1. Add flags: 
  1631  	- `--verbose/-v`: total status, current output + HA + networking junk
  1632  	- `--summary`: human readable summary - not YAML (this is dependant on mini-plugin below)
  1633  	- “`--interesting`”: items that aren’t “normal” (e.g. agent state != “Started”)
  1634  1. Write mini-plugin that takes human readable YAML and generates human readable output e.g. HTML.
  1635  1. Use watcher to monitor status instead of polling juju status cmd.
  1636  1. Extend filtering.
  1637  
  1638  ## Relation Config
  1639  
  1640  When adding a relation, we want to be able to specify  configuration specific to that relation. In settings terms, this will be “service-relation-settings”. We need to set config for either end of the service. Settings data stored for relation as a whole.
  1641  
  1642  The relation config schema is defined in charm’s `metadata.yaml`. Separate config for each end of the relation.
  1643  
  1644  The config is specified using add-relation `config.yaml` via `--config` option.
  1645  
  1646  New Juju command `relation-get-config [-r foo]` to get config from local side of the relation. If inside hook we don’t need -r.
  1647  
  1648  New `juju set-relation config.yaml` which will cause relation-config-changed hook to run.
  1649  
  1650  ### Work Items
  1651  
  1652  1. New add relation `metadata.yaml` schema.
  1653  1. Ability to store relation settings in mongo.
  1654  1. Support for processing relation config in `add-relation`.
  1655  1. `relation-get-config` command.
  1656  1. `set-relation-config` command.
  1657  1. `relation-config-changed` hook
  1658  
  1659  ## Bulk Cloud API
  1660  
  1661  The APIs we use to talk to cloud providers are too chatty e.g. individual calls to start machines, open individual ports.
  1662  
  1663  When starting many instances, partition them into instances with same series/constraints/distribution group and ask provider to start each batch.
  1664  
  1665  ### Work Items
  1666  
  1667  1. Unfuck instance broker interfaces to allow bulk invocation.
  1668  1. Rework provisioner.
  1669  1. Change instance data so that it is fully populated and not just a wrapper around an instance id, causing more api calls to be required.
  1670  1. Audit providers to identify where bulk api calls are not used.
  1671  1. Start instances to return ids only, get extra info in bulk as required.
  1672  1. Single shared instance state between environs (updated by worker).
  1673  1. Refactor prechecker etc to use cache environ state - reduce `New()` environ calls.
  1674  1. Stop using open/close ports and use iptables instead.
  1675  1. Use single security group.
  1676  1. Use firewaller interface in providers to allow azure to be handled.
  1677  1. Drop firewall modes in ec2 provider.
  1678  1. Support specifying port ranges not individual ports (e.g. charm metadata).
  1679  1. For hook tools - open ports on network for a machine not a unit.
  1680  
  1681  ## Tools Placement
  1682  
  1683  - Allow storage of tools in the local enviroment.
  1684  - Providing a catalog of the tools in the local environment.
  1685  - Refactoring the current tools lookup to use the catalog.
  1686  - Provide tools import utility to get new tools into the environment.
  1687  - Upgrades to check tools catalog to ensure tools are available for all required series, arches etc.
  1688  - Same model as for charms in state.
  1689  
  1690  ## Juju Documentation
  1691  
  1692  **William**: Write documentation while designing the feature, and give them to Nick etc. before writing code.  This is the word of god.
  1693  
  1694  **Nate**:  Use changelog file in juju-core repo to log features and bugfixes with merge proposals.
  1695  
  1696  **Nick & Jorge**:  we’re just a couple people, juju core is 20 people now.
  1697  
  1698  **Ian**: can’t require changelog per merge, since a single feature may be many many merges, which might have no user facing features.
  1699  
  1700  This must actually happen or Jorge has permission to kill Nate.
  1701  
  1702  Nate to get buy in from team leads.
  1703  
  1704  # Charm Config Schema
  1705  
  1706  Users find our limited set of types in config (String, Bool, Int, Float) limited, and have to do things like pickle lists as base64. See [bug](https://bugs.launchpad.net/juju-core/+bug/1231526) which largely covers this.
  1707  
  1708  - Map existing YAML charm config descriptions into a JSON schema.
  1709  - Extend existing YAML config to something that can be mapped well to JSON schema.
  1710  - Currently have a config field in charm document.
  1711  - Create a schema document that charm links to.
  1712  - Upgrade step that takes existing config field and creates new document linked to charm.
  1713  - Add support in `juju set` for new format.
  1714  - Add flag to `juju get` to output new format.
  1715  
  1716  New types we want: enums, lists, maps (keys as strings, values as whatever).
  1717  
  1718  Open questions: how charms upgrade their own schema types - there’s existing pain here where for instance the OpenStack charms are stuck using “String” for a boolean value because they cannot safely upgrade type.
  1719  
  1720  Pyjuju had magic handling for schlurping files, there’s a bug feature request for a ‘File’ type.
  1721  
  1722  Note this work does not include constraint vocabularies.  See Ian Booth for that work.
  1723  
  1724  # Juju Solutions & QA
  1725  
  1726  This is very dependent on which charm you are looking at.  I assume there were particular things that came up in the Cloud Foundry work that need attention.   We have been building up test infrastructure quite quickly, which is one part of helping improve quality -- but the biggest thing is growing communities around particular charms.
  1727  
  1728  # Juju QA
  1729  
  1730  ## CABS Reporting
  1731  
  1732  The feature has stalled as goals and APIs churned.
  1733  
  1734  1. What are the goals of reporting?
  1735  1. What is the data format that cabs will provide for reporting?
  1736  1. How do we display the reports?
  1737  
  1738  ## Scorecard
  1739  
  1740  The scorecard is a progress report to measure our activity and correlate it to our successes and failures. Most of the work is done by hand. Though most of the information gathering can be automated, it was the lowest priority for the Juju QA team. How much time will we save if we automate some or all of the information gathering?
  1741  
  1742  Juju QA has scripted most of what it gathers for the score card. The data is entered by hand instead of added to tables and charts by an automated process. These are the kinds of data that the team knows how to gather:
  1743  
  1744  1. Bugs reported, changed, or fixed.
  1745  1. Branch commits.
  1746  1. Time from report, to start to release of bugs and commits.
  1747  1. Releases of milestones.
  1748  1. Downloaded installers and release tarballs (packagers and homebrew).
  1749  1. Installs of clients from PPAs.
  1750  1. Downloads of tools from public streams.
  1751  
  1752  ### Work Items
  1753  
  1754  1. GUI
  1755  	1. Bundles deployed
  1756  	1. Charms deployed
  1757  	1. visiting to jujucharms.com and juju.ubuntu.com
  1758  	1. Quick-start downloads
  1759  	1. Number of releases
  1760  	1. Number of bugs
  1761  	1. Number of bugs closed
  1762  1. Core
  1763  	1. Number of external contributors
  1764  	1. Number of fix committed
  1765  	1. Number running envs (charmstore is queried every 90 min for new charms)
  1766  		- Do we know which env the charm query was for?
  1767  	1. Client installs (from ppa, cloud archive trusty)
  1768  	1. Number of tools downloaded (from containers and streams.c.c)
  1769  	1. Add anonymous stat collection to juju to learn more
  1770  1. Eco
  1771  	1. Number of canonical and non-canonical charm committers
  1772  	1. Number of people in #juju (and #juju-dev)
  1773  	1. Number of subscribers juju and juju-dev mailing lists
  1774  	1. NUmber of charms audited
  1775  	1. AskUbuntu Conversion (Questions Asked & Answered)
  1776  	1. Number of tests in charms
  1777  1. QA
  1778  	1. Metric
  1779  	1. Days to bug triage
  1780  	1. CI tests run per week
  1781  	1. Number of solutions tested
  1782  	1. Number of clouds solution tested on
  1783  	1. Number of juju core releases
  1784  
  1785  ## Charm Testing Reporting
  1786  
  1787  Charm test reporting has faced obstructions from several causes. There are two central issues. One, reliable delivery of data to report, and two, completion of the reporting views.
  1788  
  1789  1. Charm testing data formats change without notice.
  1790  1. Charm testing uses unstable code that can break several times a day, preventing gathering and publication of data.
  1791  1. Charm testing leaves machines behind.
  1792  1. Charm testing can exceed resource limits in a cloud.
  1793  1. Charm testing doesn’t support multiple series.
  1794  1. Charm reports doesn’t show me a simple table of clouds a charm runs on.
  1795  	1. Most charms don’t have tests-- can we have a simple test to get every charm listed?
  1796  	1. I don’t know the version of the charm.
  1797  	1. I don’t know the last version that passed all tests.
  1798  1. Charm details reports don’t show me the individual tests.
  1799  	1. I don’t know the series.
  1800  	1. I don’t know the version that last passed the individual test.
  1801  
  1802  ### Work Items
  1803  
  1804  1. Create a new jenkin that uses the last known good version of substrate dispatcher (lp:charmtester).
  1805  1. Staging charmworld or something will trigger a test of a branch and revision.
  1806  	1. Provide charmers with a script to test the MP/pull requests.
  1807  	1. Provide a way to poll Lp an Gh to automatically run the tests for the MP/PR.
  1808  	1. Provide a way to test tip of each promulgated charm.
  1809  1. Reporting needs to pick the data from the new test runner/jenkin.
  1810  1. Overview should list every charm tested.
  1811  	1. Does the charm have tests?
  1812  	1. A link to the specific charm results.
  1813  	1. Which clouds were tested and did the suite pass?
  1814  	1. What version was tested?
  1815  	1. What is the last known-good version to pass the tests for a substrate.
  1816  	1. What version passed all substrates.
  1817  1. For any charm, I need to see specific charm results.
  1818  	1. Which substrates were tested?
  1819  	1. The individual tests run in substrate, show name of the test and pass/fail.
  1820  	1. Need a link to see the fail log located somewhere.
  1821  	1. What was the last version of the charm to pass the test.
  1822  1. Update substrate dispatcher or switch to bundle tester to to gather richer data.
  1823  	1. Ensure `destroy-environment`.
  1824  	1. Capture and store JSDON data instead logs.
  1825  1. We will get use cases for the charm test reports that will verify the report meets expectations.
  1826  1. Tests could state their needed resources and the test runner can look to see if they are available. The tests can be deferred until resources are available.
  1827  
  1828  ## Charm testing with juju Core
  1829  
  1830  1. We test with stable juju and charm.
  1831  1. We could test with unstable.
  1832  	1. Only test the popular charms for each revision.
  1833  	1. Or only test charm with tests.
  1834  	1. Or test bundles which has valid combinations.
  1835  1. Test all the charms occasionally.
  1836  1. Historically when charms break with new juju, it is the charm’s fault.
  1837  
  1838  ## Charm MP/Pull Gate on Charm Testing
  1839  
  1840  Charm merges could be gated on a successful test run against the supported clouds.
  1841  
  1842  - Allow charmers to manually request a test for a branch and revision.
  1843  - Maybe extend the script to poll for pull requests/merge proposals.
  1844  - Charm testing doesn’t support series testing yet.
  1845  
  1846  ### Testing
  1847  
  1848  1. Test MP or pull request.
  1849  1. Test merge and commit on pass.
  1850  1. Charm testing runs and is actually testing that juju or ubuntu still works for the charm.
  1851  
  1852  ## CI Charm and Bundle Testing
  1853  
  1854  Testing popular bundles with Juju unstable to ensure the charms and bundles continue to work.
  1855  
  1856  1. Notify the charm maintainer or the juju developers when a break will happen.
  1857  1. Can testing be automated to grow newly popular charms and bundles?
  1858  1. There are resource limits per cloud.
  1859  
  1860  ### Notes
  1861  
  1862  - Charm testing could be simplified to proof and unit tests.
  1863  - Bundle tests would test relations.
  1864  - Current tests don’t exercise failures or show error recovery.
  1865  - Ben suggests that amulet tests in charms are could be moved to bundles.
  1866  - Charms are like libraries, bundles are like applications.
  1867  	- Bundles are known topologies that we can support an recommend.
  1868  	- Charm tests could pass, but break other apps;  the bundle level where we want to test.
  1869  - Workloads are more like bundles, though some charms might be not need to in a relation, so a bundle of one.
  1870  - Config testing is valuable at the charm-level and bundle-level.
  1871  - Integration suites might work on a charm or a bundle.
  1872  	- Cloud-foundry tests only work with the bundle...running the suite for each charm means we construct the bundle multiple times and rerun tests.
  1873  - The charm author might right weak tests. Reviewer need to see this and respond. Bundles represent how users will use the charm, and that is what needs testing to verify utility and robustness.
  1874  - Bundle tester has a test pyramif.
  1875  	- Proofing each charm.
  1876  	- Discovering unit testing in each charm.
  1877  	- Discovering integration tests and running them.
  1878  - Bundle testing has a known set of resources...which is needed when testing in a cloud.
  1879  - Bundle tests provide the requirements for any software’s own stress and function tests.
  1880  - Charm reports would use the rich JSON data.
  1881  
  1882  ### Work Items
  1883  
  1884  1. Review BenS Bundle testing for integration into QA Jenkins workflow
  1885  	1. Get back to BenS with any questions.
  1886  1. Use cases to drive what reports need to show.
  1887  	1. What do the different stakeholders need to discover reading the reports?
  1888  	1. What actions will stake holders take when reading the reports?
  1889  1. Do bundle tests poll for changes to bundles or the charms they use?
  1890  	1. The alternate would be to test on demand.
  1891  	1. Gated merges of MP/PR mean there is little value in testing on push.
  1892  
  1893  ## CI Ecosystem Tests
  1894  
  1895  We want to extend Juju devel testing to verify that crucial ecosystems tools operate with it. When there is an error, the Juju-QA team will investigate and inform 1 or both owners of the issue that needs resolution.
  1896  
  1897  The juju under test will be used with the other project’s test suite. A failure indicates Juju probably broke something, but maybe the other project was using juju in an unsupported way.
  1898  
  1899  Juju CI will provide a simple functional test to demonstrate an example case works.
  1900  
  1901  We want a prioritised list of tests to deliver.
  1902  
  1903  1. Juju GUI
  1904  1. Juju Quickstart
  1905  1. Azure juju GUI dashboard
  1906  1. jass.io
  1907  1. Juju Deployer
  1908  1. mojo
  1909  1. amulet
  1910  1. charm tools
  1911  1. charm helpers
  1912  1. charmworld
  1913  
  1914  ### Work Items
  1915  
  1916  1. Quickstart
  1917  	1. Quickstart relies on CLI and API, and config files. It waits for the GUI to come up in the env then deploy bundles.
  1918  	1. Quickstart opens a browser to show.
  1919  	1. Testing
  1920  		1. Install the proposed juju.
  1921  		1. Run juju-quickstart bundle to a bootstrapped env.
  1922  			1. Tries to colocate the bootstrap node and GUI when not local provider and the series and charm have the same series.
  1923  			1. Otherwise GUI is in a different container.
  1924  			1. `juju status` will list the charms from the bundle.
  1925  		1. Rerun juju-quickstart bundle.
  1926  			1. Verify the same env is running with eh same services.
  1927  	1. GUI team need to write
  1928  		1. Functional tests.
  1929  		1. Allow the tests to be run on lxc.
  1930  1. Juju GUI charm
  1931  	1. “make test” will deploy the charm about 8 times.
  1932  		1. GUI is deployed on bootstrap node to make the test faster.
  1933  		1. If the provider is local gui should be in a different container.
  1934  	1. The charms has tests that are run by juju test.
  1935  		1. The functional tests run the default juju.
  1936  		1. We can use the juju under test with the charm.
  1937  	1. An env variable is used select the series for the charm.
  1938  	1. Test with a bundle implicitly tests deployer.
  1939  
  1940  ## CI Cloud and Provider Testing
  1941  
  1942  Juju CI tests deployments and upgrades from stable to release candidate. We might want additional tests.
  1943  
  1944  1. Canonistack tests are disabled.
  1945  	1. Swift fails; IS suspect misconfiguration or bad name (rt 69317).
  1946  	1. Canonistack has bad days where no one can deploy.
  1947  1. Restricted and closed networks?
  1948  	1. CI has a restricted network test that shows the documented sites and ports are correct, but it doesn’t verify tools retrieval.
  1949  	1. A closed network test would have proxies providing every documented requirement of Juju.
  1950  1. Constraints?
  1951  1. Placement?
  1952  1. `add-machine`, `add-unit`?
  1953  1. Health checks for by series?
  1954  
  1955  ### Work Items
  1956  
  1957  1. Placement tests are required for AWS and OpenStack.
  1958  1. `add-machine` and `add-unit` can be functional tests.
  1959  1. Need nova console log when we cannot ssh in.
  1960  1. Constraints are mostly
  1961  	1. Unique
  1962  		1. Azure availability sets (together relationship)
  1963  		1. AWS/OpenStack availability zones (apart relationship)
  1964  		1. Security groups
  1965  		1. MaaS networks
  1966  
  1967  ## CI Compatibility Function Testing
  1968  
  1969  Juju CI has functional tests that exercise a function works across multiple versions of juju and when juju is working with multiple versions of itself.
  1970  
  1971  1. Unstable to stable command line compatibility.
  1972  	1.  Verify deprecation, not obsolescence.
  1973  	1. Verify scripted arguments do not break after an upgrade.
  1974  1. 100% major.minor compatibility. Stable micro releases work with every combination?
  1975  	1. The means keeping a pool of stable packages for CI.
  1976  	1. Encourages creating new minor stables instead of adding test combinations; but SRU discourages minor releases.
  1977  	1. CI is **blocked** because Juju doesn’t allow anyone to specify the juju version to bootstrap the env with, nor can agent-metadata-url be set more than once to control the version found.
  1978  
  1979  ### Work Items
  1980  
  1981  1. Juju bootstraps with the same version as the client.
  1982  1. Then juju upgrades/downgrades the other agents to the current version.
  1983  1. Ubuntu wants 100% compatibility between the client in trusty and all the servers that trusty has ever had.
  1984  	1. If trusty had juju 1.18.0, 1.18.1, 1.20.0, we need to show that clients work with all the servers.
  1985  1. We could parse the help and when an option disappears, we report bugs when options disappear. We need to see that commands and options are deprecated.
  1986  	1. We want to remove the deprecated features from the help to keep docs clean, but that makes deprecations look like obsolescence
  1987  1. Client to server is command line to API server.
  1988  	1. Standup each server, the for each client check that they talk.
  1989  	1. We don’t need to repeat historic combinations.
  1990  		Test the new client with the old servers.
  1991  		1. Test the old clients with the new servers.
  1992  1. The tests could be status, upgrade, and destroy, but if we had a API compatability check, we could quickly say the client and server are happy together
  1993  1. Maybe split the juju package to have a juju-server and juju-client package. Trusty gets the new juju client package. The servers are in the clouds.
  1994  
  1995  ## CI Feature Function Testing
  1996  
  1997  Juju Command testing
  1998  
  1999  1. Backup and restore (in progress).
  2000  1. HA
  2001  1. Charm hooks, relations, and expose, and upgrade-charm.
  2002  	1. Is the env setup for the hook.
  2003  	1. Do relations exchange info.
  2004  	1. Do expose/unexpose update ports?
  2005  	1. `upgrade-charm` downloads a charm and calls the upgrade hook.
  2006  1. ssh, scp, and run.
  2007  1. We claim gets the same env as a charm...we can test that the charm and run have the same env.
  2008  1. set/get config and environment.
  2009  	1. Which options are not mutable?
  2010  
  2011  ### Work Items
  2012  
  2013  1. For every new feature we want to prepare a test that exercises it.
  2014  	1. Developer are interested in writing the tests with QA.
  2015  	1. Some tests may need to be run in several environments.
  2016  	1. Revise the docs about writing test and send them to developers.
  2017  1. Add coverage for historic features.
  2018  	1. `add-machine` / `add-unit`
  2019  	1. set/unset/get of config and env
  2020  	1. ssh, scp, and run
  2021  	1. charm hooks, relations, and expose, unexpose, and upgrade-charm
  2022  	1. init
  2023  	1. `get-constraints`, `generate-config`
  2024  
  2025  ## CI LTS (and other series and archs) Coverage
  2026  
  2027  What is the right level of testing? Duplicate testing for each supported series may not be necessary. Unnecessary tests take time and limited cloud resources.
  2028  
  2029  1. Can we test each series as an isolated case from clouds and providers?
  2030  1. Must we duplicate every cloud-provider test to ensure juju on each series in each cloud works.
  2031  1. Local provider seems to need a test for each series and juju.
  2032  1. Unit tests pass on amd64.
  2033  	1. PPC64el is close to passing.
  2034  	1. i386 and arm64 are not making progress.
  2035  1. Switch to golang 1.2.
  2036  
  2037  ### Work Items
  2038  
  2039  1. The default test series will be trusty; precise is an exceptional case.
  2040  1. Golang will be 1.2.
  2041  	1. Golang 1.2 must be backported to precise and maybe saucy.
  2042  	1. If not, juju will have to abandon precise or only be 1.1.2 compatable.
  2043  1. Build juju on the real archs or cross compile to create tools.
  2044  	1. Build juju on trusty amd64.
  2045  	1. Build juju on precise amd64.
  2046  	1. Build juju on trusty i386.
  2047  	1. ppc64+trusty  will make gccgo-based juju.
  2048  	1. Need a machine to do arm64+trusty to make gccgo-based juju.
  2049  	1. Maybe CentOS.
  2050  	1. Maybe Win8 (agent for active server charm).
  2051  1. Remove the 386 unit tests, replace it wil a 386 client test.
  2052  1. Add tests for precise (where-as we had has special tests for trusty).
  2053  	1. Test a precise upgrade and deploy in one cloud.
  2054  1. Test each series+arch combination for local provider to confirm packaging and dependencies.
  2055  	1. precise+amd64 local
  2056  	1. trusty+amd64 local
  2057  	1. utopic+amd64 local
  2058  	1. trusty+ppc64 local
  2059  	1. trusty+arm64 local
  2060  1. Test client-server different series and arch to ensure the client’s series/arch does not influence the selection of tools.
  2061  	1. Utopic amd64 client bootstraps a trusty ppc64.
  2062  	1. We already test win juju client to juju precise amd64.
  2063  
  2064  ## CI MaaS and vMaaS
  2065  
  2066  Juju CI had MaaS access to for 3 days. The tests ran with success. How do we ensure juju always works with MaaS?
  2067  
  2068  1. CI wants 5 nodes.
  2069  1. CI wants the provider to be available at a moment's notice to run tests for the new revisions, just like all cloud are always available.
  2070  1. CI probably does care if MaaS is in hardware or virtualised. No public clouds support vMaaS today.
  2071  
  2072  ### Work Items
  2073  
  2074  1. Ask Alexis, Mark R, and Robbie for mass hardware or access to stable MaaS env.
  2075  
  2076  ## CI KVM
  2077  
  2078  Juju CI has local-provider KVM tests, but they cannot be run. Engineers have run them on their own machines.
  2079  
  2080  1. CI wants 3 containers.
  2081  1. CI needs root access on real hardware (hence developers run on their machines).
  2082  1. CI does care about hardware; no public clouds support KVM today?
  2083  
  2084  ### Work Items
  2085  
  2086  1. We can use the one of the 3 PPC machines.
  2087  1. We need to setup a slave in the network.
  2088  	1. Ideally we can add a machine and deploy Jenkins slave to it.
  2089  	1. Or we standup a slave without juju.
  2090  	1. Or we change the scripts to copy the tests to the machine.
  2091  
  2092  ## Juju in OIL
  2093  
  2094  We think there may be interesting combinations to test. We know from bug reports that Juju didn’t support Havana’s multiple networks.
  2095  
  2096  1. We want to know if Juju fails with new versions of OpenStack parts.
  2097  1. We want to know if Juju fails with some combinations of OpenStack.
  2098  
  2099  ## Vagrant
  2100  
  2101  1. Run the virtual box image in a cloud.
  2102  	1. We care that the hosts mapping of dirs works with the image so that the charms are readable.
  2103  1. Exercise the local deployment.
  2104  	1. Deploy of local must work.
  2105  1. Failures might be
  2106  	1. Redirector of GUI failed.
  2107  		1. Packages in the image needed updating.
  2108  	1. lxc failed.
  2109  		1. Configuration of `env.yaml` might need changing.
  2110  		1. Command line deprecated or obsolete.
  2111  	1. When juju packaging deps change, the images need updating.
  2112  1. May need to communicate with Ben Howard to change the image.
  2113  1. Can CI pull images from a staging area to bless them?
  2114  1. Can we place the next juju into the virtual env to verify next juju works.
  2115  
  2116  ## Bug Triage and Planning
  2117  
  2118  We have about 15 months of high bugs. Our planning cycles are 6 months. Though we are capable of fixing 400 bugs in this time, we know that 300 of the bugs are reported after planning. We, stakeholders, and customers need to know which bugs we intend to fix and those that  will only be fixed by opportunity or assistance.
  2119  
  2120  1. Do we lower the priority of the 150  bugs?
  2121  	1. Do we make them medium? Medium bugs are not more likely to be fixed then low bugs...opportunity doesn’t discriminate by importance. We could say medium bugs are the first bugs to be re-triaged when we plan.
  2122  	1. Do we make them low? Low bugs obviously mean we don’t intend to fix the issue soon. Is it harder to re-triage all low bugs?
  2123  1. Do we create more milestones to organize work and show our intent? Can we plan work to be expedited instead of deferred?
  2124  	1. Target every bug we intend to address to a cycle milestone.
  2125  	1. Retarget some to major.minor milestones as we plan work.
  2126  	1. Retarget each to major.minor.micro milestones when branches merge.
  2127  1. Triaging every bug. Juju-GUI, deployer, charm-tools and a few others often have untriaged bugs that are week old. Who is responsible for them? https://bugs.launchpad.net/juju-project/+bugs?field.status=New&orderby=targetname
  2128  
  2129  ### Work Items
  2130  
  2131  1. Want milestones that represent now, next stable, and cycle.
  2132  1. Now is the next release for the 2 week cycle.
  2133  	1. Team target the bugs they want to fix the cycle.
  2134  	1. We can see it burn down
  2135  1. Next stable are all the bugs  we think define a stable release.
  2136  	1. This doesn’t burn down because most bugs are retargeted. Some bugs will remain as they are the final bugs fixed to stable.
  2137  	1. 3 stable releases per 6-month cycle.
  2138  	1. Do we want a next next?
  2139  1. The cycle is 3 or 5 months that are all the high bugs we want to fix.
  2140  	1. We define stable milestones by pulling from the horizon milestone.
  2141  	1. Can we ensure there is a maximum capacity for the milestone? If you add a bug, you must remove the bug.
  2142  1. Critical
  2143  	1. CI breaks. QA team will do first level of analysis.
  2144  	1. Regressions are critical, but we may be reclassified.
  2145  	1. Critical bugs need to be assigned.
  2146  1. Flaky tests are High bugs in the current milestone.
  2147  1. Alexis and stakeholders will drive some bugs to be added or moved forward.
  2148  1. We have 15 months of high bugs
  2149  	1. To harden we need know which high bugs need fixing.
  2150  	1. We want to retriage all the high bugs and make most of them medium?
  2151  		1. Review the medium bugs regularly to promote them to high for the upcoming cycle or demote them to low.
  2152  	1. We want 75 bugs to be high at any one time (1 page of high bugs).
  2153  
  2154  ## Documentation
  2155  
  2156  We want documentation written for the release notes before the release. We need greater collaboration to:
  2157  
  2158  1. Know which features are in a release.
  2159  1. Know how the features work from the developer notes.
  2160  1. Include the docs to the release notes.
  2161  1. Developers review the release notes for errors.
  2162  1. Adequately document features in advance of release where possible.
  2163  
  2164  We also need to discuss how versioning of the docs is going to work moving forward, and how we will manage and maintain separate versions of the docs, e.g. 1.18, 1.20, dev (unstable).
  2165  
  2166  ## MRE/SRU Juju into trusty
  2167  
  2168  We want the current Juju to always be in trusty. We don’t like the cloud-archive because the current juju isn’t really in Ubuntu.
  2169  
  2170  - Ubuntu wants guaranteed compatibility.
  2171  	- CI needs to ensure all versions of juju in a series work together.
  2172  - Landscape has an exception to keep current in all supported series.
  2173  	- Landscape only puts the client in supported series.
  2174  	- The server is in the clouds.
  2175  	- The client is stable, it changes slowly compared to the server.
  2176  	- The client works with many versions of the server, but tends to be used with the matching server.
  2177  - James Page suggests that juju be packaged with different names to permit co-installs. juju-1.20.0.
  2178  
  2179  ## Juju package delivers all the goodness
  2180  
  2181  1. apt-get install juju could provide juju-core, charm-tools, deployer.
  2182  
  2183  ## juju-qa projects
  2184  
  2185  1. Juju is moving to GitHub, Jerff and other canonical machines can only talk to Launchpad.
  2186  1. The ci-cd-scripts2 must be on Launchpad.
  2187  1. We must split the test branch from the juju project.
  2188  1. We may want to split the release scripts from test scripts.
  2189  
  2190  # Juju Solutions
  2191  
  2192  ## Great Charm Audit of 2014
  2193  
  2194  We've been doing an audit over the last couple of months -- and will continue.   We've scaled up the Charmers team from 2 people 5 months ago, to 7 or 8 by Vegas, so we are adding a lot more firepower on this front -- but that's all still new.   I expect to see significant increase on our charming capacity for the next cycle. 
  2195  
  2196  ## Pivotal Cloud Foundry Charms
  2197  
  2198  Discussion points:
  2199  
  2200  1. The pivot from packages to artifacts and why.
  2201  	1. Tarball of binaries for a given release.
  2202  	1. +1 on proceeding for orchestrating artifacts post Bosh build.
  2203  1. Altoros, internal staffing, schedule.
  2204  1. CF Service Brokers.
  2205  1. Brief look at current status, juju canvas.
  2206  1. What is demo-able by ODS?
  2207  
  2208  ## IBM Workloads
  2209  
  2210  ## ARM Workloads
  2211  
  2212  ## CABS
  2213  
  2214  ## Amtulet
  2215  
  2216  - We want to know which charms are following an interface exchanged.
  2217  - When an interface is exchanged this is the information is passed.
  2218  - Then replay that.
  2219  - This boils down to we need an interface specification.  
  2220  - Mock up interface relations.
  2221  - Or figure out what the status is of the health check links.
  2222  - An opportunity to call the hook in integration suites.
  2223  - Could adopt some simplified version of Juju DB.
  2224  - They are talking about schema for next cycle.
  2225  - That probably isn’t the right answer.
  2226  - Someone would need to take over maintainership from Kapil.
  2227  - You need detailed knowledge of how Juju works.
  2228  - We want to know which charms are following an interface exchanged.
  2229  - When an interface is exchanged this is the information is passed.
  2230  - Then replay that.
  2231  - Build a quorum of what an interface looks like.
  2232  - This is the relation sentry in amulet.
  2233  - The problem with the relation sentry is the name is based on the
  2234  - Hacking around a problem that can be solved with tools in core.
  2235  - If core is not going to fix this, we need to hack round it.
  2236  - Bundle testing or Unit testing.  
  2237  - Is this portion of a deployment reusable to other?
  2238  - Depends on where we are going.
  2239  - 100% Certain bundle testing is the way of the future.
  2240  - Take some time writing a test and see how it would look.
  2241  - What is really needed?
  2242  - Do a single bundle test and see what that looks like it.
  2243  - Looking at this with a fresh set of eyes, may show us new aspects
  2244  - Once we go through the review of CI and see if we can.
  2245  
  2246  ## Charm Tools
  2247  
  2248  ## CharmWorld Lib
  2249  
  2250  ## Charm Helpers
  2251  
  2252  - Folks interested: Chuck, Marco, Ben, Cory
  2253  - Break out contrib into charm helper contirib.
  2254  - Define a way to deliver.
  2255  - Where to I get it?
  2256  - How do I use it?
  2257  - What libraries are available?
  2258  - Actions
  2259  	- Delivery via the install hook.
  2260  	- Document
  2261  	- Move as much as possible out of contrib to core.
  2262  - Thursday  May 1
  2263  - Use doctest to ensure the documents are right.
  2264  	- doctest does not scale up very well.
  2265  - Unit test docs before promotion to core
  2266  - Move the things from outside of contrib and core into core
  2267  - Use Wheel packaging it is a blob format (make dist).
  2268  - Actually use and adhere to symantic versioning. 
  2269  	- This may include changes to  charm helpers sync to get the right version.  Fuzzy logic to find different versions.
  2270  - Chuck = Investigate the Altoris charm template for charm helpers
  2271  
  2272  ## Java Bundle
  2273  
  2274  ## HDP 2.0 Bundle 
  2275  
  2276  - Create 12 charms for GA release of Hadoop Apache that Hortonworks supports.
  2277  	- http://hortonworks.com/hdp/
  2278  - Need to get communication from IBM on the porting of 12 components over to Power.
  2279  - Need to identify which HDP version is going to be the released version. 
  2280  - 3.0 will most likely be the next GA release.
  2281  - Need to support multi-language in the GUI.
  2282  - Next milestone:
  2283  	- Hadoop Summit demo.
  2284  
  2285  ## Big Data Roadmap
  2286  
  2287  - Optimizations
  2288  	- File system via Juju through storage feature.
  2289  	-Image based Hadoop specific images.
  2290  - Conferences
  2291  	- Hadoop Summit (June)
  2292  	- Strata NY
  2293  - Demos
  2294  	- See how we can hook the Hadoop bundle into a charm framework bundle (e.g. Rails).
  2295  	- See how we can plug in multiple data sources.
  2296  		- Cancer, etc.
  2297  - Feature requests
  2298  	- Ensure that for services that need different fault domains/availability sets.
  2299  		- This may be resolved with tagging in MaaS.
  2300  			- Tag fault domain 1 and fault domain 2.
  2301  				- This is exposed to juju via the GUI.
  2302  	- Have the GUI/Landscape show which machines are in a given zone.
  2303  - Idea/need
  2304  	- We need to provide a means for Hadoop users to be able to put in their map-reduce java classes without having access to the admin portion of juju where hadoop is deployed.
  2305  		- The idea is to create a shim/relatoin/sub that provides a user level access to users to be able to add in their map-reduce  jobs.
  2306  
  2307  ## AmpLab Bundle
  2308  
  2309  ## Juju Actions in Bundles
  2310  
  2311  ## Charms in Git
  2312  
  2313  ## Charms Series Saga
  2314  
  2315  ## Fat Bundles and Caching Charms on Bootstrap Node
  2316  
  2317  ## Fat Charms in Closed Environments
  2318  
  2319  - Detect Ports calling to the outside network.
  2320  
  2321  ## UA Charm Support Story
  2322  
  2323  - Support bundles not charms.
  2324  - CTS validates the bundle relations and config.
  2325  - Has to have test.
  2326  - Need to have bundles in the charms store mark that it is UA supportable.
  2327  
  2328  ## How to engage Joyent & Altoros on provider support
  2329  
  2330  ## Unstable Doc Branches & Markdown
  2331  
  2332  ## Gating Charm merge proposals on charm testing passing
  2333  
  2334  - Many useful relations.
  2335  - I expect this is very charm specific -- please feel free to list relations that we need.
  2336  
  2337  ## juju.ubuntu.ccom doc versioning
  2338  
  2339  - Marco, Jorge, Curtis, Matthew
  2340  - Branches will be versions in Git.
  2341  	- 1.18
  2342  		- en
  2343  		- fr
  2344  	- 1.20
  2345  		- en
  2346  		- fr
  2347  - Branches will be versions in Git.
  2348  - How to generate  docs for live publishing.
  2349  - Juju QA team will build the markdown to HTML conversion.
  2350  	- In this conversation the Juju QA team will also incorporate the languages and drop down for versioning.
  2351  - Jorge to speak to the translations team on the best way forward.
  2352  - When committing to docs master the reviewer should also commit to unstable docs.
  2353  - Keep assets in a separate directory outside the versions and languages so we only have to updates one place for assets.
  2354  
  2355  - Move author docs to a separate repository, but keep them in the nav for the live juju.ubuntu.com site.
  2356  	- The reason is that authoring docs should always be the current independent of the release.  Charm authoring should work the same across all releases. Thus, we should always show the latest.
  2357  		- The main idea is to de-couple the charm author docs for the user docs as we always want to show the latest charm author docs (as charm authoring should always work the same across releases).  This helps the scenario if we need to update the charm author docs we will need to update all the branches.
  2358  	- We will need to update the juju contributor docs once we move the charm author section.
  2359  
  2360  ## Juju and OpenStack
  2361  
  2362  - Juju in keystone - Juju as a multi-tenant component registered in keystone.
  2363  - Juju in horizon - Juju gui and ui in horizon.
  2364  - Juju in heat - Juju / Deployer/bundle style exposed as dsl in heat.
  2365  
  2366  # Juju GUI 
  2367  
  2368  ## Juju in OpenStack Horizon - Juju GUI in horizon
  2369  
  2370  ### Issues to resolve
  2371  
  2372  - Embedding UI path? An OpenStack project or into an existing one.
  2373  - Embedding UI as far as framing/styling.
  2374  - Required timeframe, map out paths of resistance to make OpenStack release.
  2375  - The guiserver (python/tornado) running in that stack.
  2376  	- No bundles without deployer access.
  2377  		- Build deployer into core?
  2378  		- Build a full JS deployer?
  2379  	- No local charms file content.
  2380  
  2381  ## Juju in Azure - Juju GUI in Azure
  2382  
  2383  ### Issues to resolve
  2384  
  2385  - Embedding UI path? Hosted externally and referenced in? Need to meet specific Azure tooling requirements?
  2386  - Embedding UI as far as framing/styling with existing Azure UX.
  2387  - Additional required functionality.
  2388  	- List environments.
  2389  - Required timeframe, map out paths of resistance to make deliverables.
  2390  - The guiserver (python/tornado) running in that stack.
  2391  	- No bundles without deployer access.
  2392  		- Build deployer into core?
  2393  		- Build a full JS deployer?
  2394  	- No local charms file content.
  2395  
  2396  ## Juju UI networks support
  2397  
  2398  - Which types of networking supported and will be supported in core this cycle? Others planned to make sure design scales/works.
  2399  - What does design have for UX of this so far?
  2400  - Provider differences, sandbox, etc
  2401  - Make sure api exposure is complete enough in core to aid all UI team needs put forth by design.
  2402  	- Get anything not onto someone’s schedule.
  2403  
  2404  ## Juju UI Machine view 1.5
  2405  
  2406  Most of this is a sync with design and check on what we put into 1.0 vs the final desired product.
  2407  
  2408  - Deployed services inspector.
  2409  - Better search integration.
  2410  - Pre deployment config and visualization of bundles.
  2411  - Better local charms integration.
  2412  - Improved interactions (full drag/drop with the walkthrough/guide material).
  2413  
  2414  ## Juju UI Design Global Actions
  2415  
  2416  We’ve got a series of tasks on the list that require is to find a way to represent things across the entire environment. We need to sit down with design and look at a common pattern to use for these ‘global’ environment-wide tools, many of which mirror tasks at the service, machine, and unit level.
  2417  
  2418  ### Items to discuss
  2419  
  2420  - Design a home for global environment information.
  2421  - HA status/make HA.
  2422  - SSH Key management.
  2423  - Environment level debug-log.
  2424  - Environment level juju-run.
  2425  
  2426  ## In the trenches - customer feedback for GUI
  2427  
  2428  The GUI team would like to meet with ecosystems and others selling/deploying the GUI in the field and get feedback on things we can and should look at doing to make the GUI a better tool and product. The goal is to help prioritize and give us ideas of paper cuts we should schedule to fix during maintenance time in the next cycle. 
  2429  
  2430  ## Juju UI Product Priorities
  2431  
  2432  There’s a backlog of features to add to the GUI. We need a product team opinion on which to prioritize as we work around bigger tasks like Azure embedding. We won’t be able to get all done this cycle so we’d like feedback on those most useful to selling/marketing Juju.
  2433  
  2434  - Debug log
  2435  - HA representation controls
  2436  - Network support
  2437  - Juju Run
  2438  - Multiple Users
  2439  - Fat bundles
  2440  - juju-quickstart on OS X
  2441  - juju-quickstart MaaS support
  2442  - SSH Key management UI
  2443  
  2444  ## Core Process Improvements
  2445  
  2446  ### Documentation
  2447  
  2448  - Ian - use launchpad to track what bugs are where and which are fixed.
  2449  - Nate - an in-repo file is easier to keep track of, easier to verify during code reviews.
  2450  
  2451  ### Standups
  2452  
  2453  - Leads meet once a week.
  2454  - Standups are squad standups.
  2455  - William 1 on 1s with leads.
  2456  - Team Leads email about team status.
  2457  
  2458  ### Vetting Ideas on Juju-dev
  2459  
  2460  - Send user feature description to juju-dev before working on features.
  2461  
  2462  ### 2-Week Planning Cycle
  2463  
  2464  - Dev release every 2 weeks.
  2465  
  2466  ### Contributing to CI tests
  2467  
  2468  - We should do that.
  2469  
  2470  ### Move core to GitHub?
  2471  
  2472  Needs to be scheduled and prioritized.  Non-zero work to get it working (build bot, process, etc).
  2473   
  2474  - Code migration
  2475  - Code review
  2476  - Landing process
  2477  - Release process
  2478  - CI
  2479  - Documentation
  2480  - Private projects (ask Mark Ramm)
  2481  
  2482  ### Work Items
  2483  
  2484  1. Code migration
  2485  	1. Do it all in one big migration.
  2486  	1. Namespace will be juju/core.
  2487  	1. Factor out others later.
  2488  	1. Disable GitHub bugtracker.
  2489  1. Code review
  2490  	1. Aim to use native GitHub code review.
  2491  	1. Find out about diffs being able to be expanded (ok, done).
  2492  	1. Rebase before issuing pull request to allow single revision to be cherry picked (investigate to be sure).
  2493  1. Branch setup
  2494  	1. Single trunk branch protected by bot.
  2495  1. Landing process
  2496  	1. Check out Rick’s lander branch (juju Jenkins GitHub lander).
  2497  	1. Run GitHub Jenkins lander on Jenkins CI instance.
  2498  1. Documentation
  2499  	1. Document entire process.
  2500  1. CI
  2501  	1. Polling for new revisions.
  2502  	1. Building release tarball
  2503