github.com/GoogleContainerTools/skaffold/v2@v2.13.2/docs-v2/design_proposals/event-api-v2.md (about) 1 # Title 2 3 * Author: Nick Kubala 4 * Design Shepherd: Any volunteers? :) 5 * Date: 04/11/19 6 * Status: [Reviewed/Cancelled/Under implementation/Complete] 7 8 ## Final decision 9 10 After multiple days invested in this direction we still feel that it is not worthwhile to go down this direction: the complexity of adding go channels adds a lot of complexity and the design more error prone for subtle bugs. The current arguments for going towards this direction is the Event API redesign to centralize event management in the Runner. This in itself is a valuable idea but at this point the current design is simpler to reason about this new design with the go channels, plus if we want to extend the events sent by builders, e.g. progress bar events, we will have to send data through the go channels anyway that represents those events and convert them to events, or directly call the `event` package - at which point the complexity of "handling events from the builders" is going to be the same. 11 We are open to introduce a streaming/reactive design in Skaffold on the long run if we have a strong enough argument/featureset for it. For this redesign I don't see that it is worthwhile. I am closing this and the related PRs. 12 13 Thank you @nkubala and @tejal29 for all the hard work in exploring this avenue in detail in code with testing and thoughtful conversations! 14 ## Background 15 16 Currently there are two major pain points with the event API: 17 18 * calling code is clunky, ugly, and not very portable 19 * event handler is racy and generally not concurrency-friendly 20 21 The second issue here comes more from bad execution rather than bad design. Some fixes for this have already been proposed and/or merged ([#1786](https://github.com/GoogleContainerTools/skaffold/pull/1786) and [#1801](https://github.com/GoogleContainerTools/skaffold/pull/1801)), so progress has already been made, but we can probably improve on this more. 22 23 The first issue is what I'll be focusing on in this issue. @dgageot has already merged one improvement to the design [here](https://github.com/GoogleContainerTools/skaffold/pull/1829), but this is only improving on an already flawed design. My goal for this redesign will be to reduce and potentially eliminate the overhead the event API has on maintainers and contributors to continue evolving the skaffold codebase. 24 25 ___ 26 27 ## Design 28 29 Right now, events are handled directly through the main loop for each builder and deployer. The internal state is directly updated through the the individual builder/deployer, and anyone adding a new one is responsible for ensuring that these events are passed to the event handler correctly. Examples: 30 31 ```golang 32 if err != nil { 33 event.BuildFailed(artifact.ImageName, err) 34 return nil, fmt.Errorf("building [%s]: %w", artifact.ImageName, err) 35 } 36 37 event.BuildComplete(artifact.ImageName) 38 ``` 39 and 40 ```golang 41 event.DeployInProgress() 42 43 manifests, err := k.readManifests(ctx) 44 if err != nil { 45 event.DeployFailed(err) 46 return fmt.Errorf("reading manifests: %w", err) 47 } 48 49 if len(manifests) == 0 { 50 return nil 51 } 52 ``` 53 54 Even though the event handling functions have been reduced to one line, the fact that they still have to be manually called from the build or deploy code is a lot of overhead for contributors, especially people who are considering adding a new builder and deployer. 55 56 Every build or deploy that skaffold performs always comes through the runner, so it would follow that the runner could handle all of the eventing for builds and deploys, as long as it can retrieve the necessary information from the builder or deployer. 57 * For deployers, this is easy: we just need to know whether or not the deploy completed successfully, and if not, what the error encountered was. This is already passed back to the runner. 58 * For builders, it's a little less straightforward. For each artifact, we'll need to know whether or not the build was successful, and if not, the error encountered. Currently, the Build() interface returns a single error back to the runner, containing information about the first unsuccessful build that it saw during the build process. Based on this, if we were building two artifacts, and one completed while the other failed, the runner has know way to know this, and assumes both failed to build. 59 60 To address this, **I propose that we change the Build() interface to return a list of build results, one for each artifact, with information about the original artifact skaffold attempted to build, and either the built artifact, or the error the builder encountered.** 61 62 This will allow the runner to: 63 * know which artifacts were built successfully, and which failed 64 * log events for each artifact, complete with a detailed error message passed back from the runner in the event that the build didn't complete successfully 65 66 With this information, we could move all of the event handling code into a decorator that we wrap the builders and deployers in during the creation of the runner, similar to the way we handle timings currently. The semantics for each state (Not Started, In Progress, Complete, Failed) and the events that transition the states should all be the same between builders and deployers, and we can share the logic for handling each of these events in a separate location. This has the added bonus of not requiring any event-related code to be written by contributors if/when they add new builders/deployers, or make changes to existing ones. 67 68 69 ## Implementation plan 70 71 Implementing this should be pretty straightforward: 72 73 1) **Rework the Build() interface** 74 75 The current method signature is 76 77 ```golang 78 Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]Artifact, error) 79 ``` 80 81 This will be changed to return a list of build results for each artifact, along with the artifact information itself. The artifact information can be embedded in the build result: 82 83 ```golang 84 type BuildResult struct { 85 Target *latest.Artifact 86 Result *build.Artifact 87 Error error 88 } 89 ``` 90 and the new method signature will be 91 ```golang 92 Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]BuildResult, error) 93 ``` 94 95 2) **Consume the new build results in the runner** 96 97 The runner can then take control of all the eventing code. Deploy results are already returned by the Deploy() interface implementations, so with the Build results we have all the information we need. Using Build() as an example (Deploy() will look similar): 98 99 * For each artifact provided to Build(), an `BuildInProgress` event is triggered 100 * Once the build is complete, for each returned result, trigger a `BuildFailed` event for results with errors, and a `BuildComplete` event for those without. 101 102 ```golang 103 for _, a := range artifactsToBuild { 104 event.BuildInProgress(a) 105 } 106 bRes, err := r.Build(ctx, out, tags, artifactsToBuild) 107 for _, res := range bRes { 108 if res.Error != nil { 109 event.BuildFailed(res.Target, res.Error) 110 } else { 111 event.BuildComplete(res.Target) 112 } 113 } 114 ``` 115 116 The main drawback here is that for builds run in parallel, the statuses will not truly reflect reality until all builds are completed, but I don't believe users will see much of an effect from this. 117 ___ 118 119 120 ## Integration test plan 121 122 The current integration tests around eventing and retrieving skaffold state should be sufficient to cover this new change. Skaffold runs are triggered, the state is retrieved at various times during the run, and the tests verify the correct results are retrieved. 123 124 Additional unit tests can be added as necessary to ensure the correct event handling methods are being called from the runner.