github.com/GoogleContainerTools/skaffold/v2@v2.13.2/docs-v2/design_proposals/sync-improvements.md (about) 1 # Sync improvements 2 3 * Author(s): Cornelius Weig (@corneliusweig) 4 * Design Shepherd: Tejal Desai (@tejal29) 5 * Date: 03/20/2019 6 * Status: Reviewed 7 8 ## Background 9 10 Currently, skaffold config supports `artifact.sync` as a way to sync files 11 directly to pods. So far, artifact sync requires a specification of sync 12 patterns like 13 14 ```yaml 15 sync: 16 '*.js': app/ 17 ``` 18 19 This is error prone and unnecessarily hard to use because the destination can often be determined by the builder. For example a destination path is 20 already contained in the Dockerfile for docker build. (see #1166, #1581). 21 In addition, the syncing needs to handle special cases for globbing and often 22 requires a long list of sync patterns (#1807). 23 24 Furthermore, builders should be able to inform Skaffold about additional sync paths (#1704). 25 This will result in much faster deploy cycles, without users having to bother about these optimizations. 26 27 #### Problems with existing syntax 28 The current `sync` config syntax has the following problems: 29 30 - The flattening of the file structure at the destination is surprising. For example: 31 ```yaml 32 sync: 33 'src/**/*.js': dest/ 34 ``` 35 will put all `.js` files in the `dest` folder. 36 - The triple-star syntax is is quite implicit and a surprising extension of the standard globbing syntax. For example: 37 ```yaml 38 sync: 39 'src/b/c/***/*.py': dest/ 40 ``` 41 will recreate subfolders _below_ `src/b/c` at `dest/` and sync `.py` files. 42 The triple-star therefore triggers two actions which should not depend on each other: 43 - do not flatten the directory structure 44 - strip everything on the source path up to `***` 45 - The current syntax does not allow extension to further sync modes unless further magic strings are introduced. 46 47 The syntax should therefore be revised. 48 49 ## Design 50 51 Skaffold sync shall have three different modes: 52 53 1. _manual_: the user specifies both the source and destination. 54 3. _auto-inferred destination_: the user specifies the syncable files, and Skaffold infers their destination in the container, e.g based on Dockerfile 55 2. _smart_: builder recommended sync rules for files, e.g. jib specifies what to sync and provides both the source and dest of files to sync. The user does not have to specify anything. 56 57 The scope of this design proposal is the _direct_ and _inferred_ mode. 58 The smart sync mode is out of scope for this document but may be mentioned at times. 59 60 ### Goals 61 The sync mechanism in skaffold should be able to infer the destination location of local files, when the user opts-in to do so. 62 63 For example: given the following Dockerfile 64 65 ```dockerfile 66 FROM nginx 67 WORKDIR /var 68 COPY static *.html www/ 69 ADD nginx.conf . 70 ADD nginx.conf /etc/ 71 ``` 72 73 The sync locations should be inferred as 74 - `static/page1.html` -> `/var/www/page1.html` (! no static folder at destination !) 75 - `static/assets/style.css` -> `/var/www/assets/style.css` 76 - `index.html` -> `/var/www/index.html` 77 - `nginx.conf` -> `/var/nginx.conf`, `/etc/nginx.conf` 78 79 A fundamental change compared to the current sync mechanism is that one source path may by sync'ed to multiple destinations in the future. 80 81 ### Assumptions 82 83 Builders know where to put static files in the container. 84 This information needs to be pulled from the builders and made available in Skaffold. 85 Currently, we know how to do this for Docker artifacts (@r2d4 suggested this in #1166). 86 Jib transforms some input files and includes others as static resources. 87 The specific sync maps need to be detailed out but are straightforward. 88 Whether the bazel builder supports this, is unclear at the moment. 89 90 ### Interface changes 91 92 The builder implementations need to fulfill a new requirement: 93 They need to provide a default sync map for changed or deleted files. 94 For that, the `Builder` interface (pkg/skaffold/build/build.go) will receive the new method `SyncMap`. 95 96 ```go 97 type Builder interface { 98 Labels() map[string]string 99 100 Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]Artifact, error) 101 102 DependenciesForArtifact(ctx context.Context, artifact *latest.Artifact) ([]string, error) 103 104 // ADDED 105 SyncMap(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error) 106 } 107 ``` 108 109 `SyncMap` returns a map whose keys are paths of source files for the container. 110 These host paths will be relative to the workingdir of the respective artifact. 111 The map values list all absolute remote destination paths where the input files are placed in the container. 112 This must be a list, because a single input file may be copied to several locations in the container. 113 Note that this does not mean that every file will be sync'ed. 114 The intent to sync a file needs to be explicitly specified by user-provided sync rules (see config changes). 115 In the future, some builders may specify default sync rules, but this is out of scope for this document. 116 117 For example, given the above Dockerfile, `SyncMap` will return something like 118 ```go 119 map[string][]string{ 120 "static/page1.html": []string{"/var/www/page1.html"}, 121 "static/assets/style.css": []string{"/var/www/assets/style.css"}, 122 "index.html": []string{"/var/www/index.html"}, 123 "nginx.conf": []string{"/var/nginx.conf", "/etc/nginx.conf"}, 124 } 125 ``` 126 127 Unsupported builders will return a not-supported error. 128 129 ### Config changes 130 131 The `artifact.sync` config needs to support additional requirements: 132 133 - Support existing use-cases for backwards compatibility, i.e. 134 - Glob patterns on the host path 135 - Transplant a folder structure, ~~or flatten the folder structure~~[this behavior was a bug and will be removed] 136 - Offer a way to use destination inference. 137 - Provide a concise and unambiguous configuration syntax to avoid the problems outlined [above](#Background). 138 - Be open for extension, such as the `smart` sync mode. 139 140 #### Suggestion 141 The `artifact.sync` config will be upgraded as follows: 142 143 ```yaml 144 sync: 145 # Existing use-case: Copy all js under 'src' into 'dest', re-creating the directory structure below 'src' 146 # This corresponds to the current `'src/***/*.js': app/` 147 manual: # oneOf=sync 148 - src: 'src/**/*.js' # required 149 dest: app/ # required 150 strip: 'src/' # optional, default to "" 151 152 # New use-case: Copy all python files and infer their destination. 153 infer: # oneOf=sync 154 - '**/*.py' 155 156 # New use-case: smart sync mode for jib (out of scope) 157 smart: {} # oneOf=sync 158 ``` 159 160 In the `sync` config, exactly one of the modes `manual` or `infer` may be specified, but not both. 161 Both `manual` and `infer` contain a list of sync rules. 162 When determining the destination for a given file, all sync rules will be considered and not just the first match. 163 Thus, a single file may be copied to multiple destinations. 164 165 Error cases: 166 - It should be an error to specify `strip` with a prefix which which does not match the static prefix of the `from` field. 167 ```yaml 168 - src: 'src/**/*.js' 169 dest: dest/ 170 strip: src/sub # ERR 171 - src: 'src/**/*.js' 172 dest: dest/ 173 strip: app/ # ERR 174 ``` 175 176 Open questions about the configuration: 177 178 **\<Is the syntax clear enough?\>** 179 Is the syntax clear? 180 Is the configuration intuitive to use? 181 182 Resolution: YES 183 184 **\<Does it improve clarity to have the `inferTo` field?\>** 185 A different approach could be to fall back to inference, if the `to` field is left blank. 186 This would have the advantage that there needs to be no validation about having `to` and `inferTo` in the same rule, which should cause an error. 187 188 Resolution: Obsolete, the improved version excludes this error case. 189 190 **\<Error for `strip` and `flatten`\>** 191 Given the new sync config, users may try to combine various flags in unintended ways. 192 For example, should `strip` and `flatten` in the same rule always be an error? 193 My intuition would say, that `strip+flatten` is the same as `flatten`, but this should to be discussed. 194 195 Resolution: Obsolete. 196 197 **\<Do we need an `excludes` rule?\>** 198 The current config only suggests additive rules. For some applications it might be easier to also remove matched host paths from the syncable set. For example: 199 ```yaml 200 sync: 201 # include all files below src... 202 - src: 'src/**/*' 203 dest: app/ 204 # ...but do not consider js files for sync 205 - src: 'src/**/*.js' 206 exclude: true 207 ``` 208 Also see #1766. 209 210 Resolution: Descoped, open separate issue 211 212 213 #### Migration plan 214 An automatic schema upgrade shall be implemented. 215 The former default behavior to flatten directories at the destination will no longer be supported. 216 This is possible, because sync is still an alpha feature. 217 Users who are using incompatible sync patterns will receive a warning during upgrade. 218 219 ### Open Issues/Question 220 221 **\<Overlapping copies\>** 222 Overlapping copies can be caused by at least two ways: 223 224 1. The `COPY/ADD` instructions in the Dockerfile may copy to the same destination. For example: 225 ```dockerfile 226 COPY foo /foo 227 COPY bar/ / # bar contains a file called "foo" 228 ``` 229 2. Two sync rules may copy files to the same destination. For example: 230 ```yaml 231 sync: 232 - src: foo 233 dest: /baz 234 - src: bar 235 dest: /baz 236 ``` 237 238 Should such cases be treated as errors, warnings, or not be detected at all? 239 240 Resolution: No detection which is the current behavior. 241 242 **\<Transitive copies\>** 243 Multi-stage Dockerfiles can lead to quite contrived scenarios. For example: 244 ```dockerfile 245 FROM scratch 246 ADD foo baz 247 FROM scratch 248 COPY --from=0 baz bar 249 ``` 250 Ideally, the sync destination should be inferred as `foo -> bar`. 251 252 This is quite an edge case and can easily be circumvented. 253 However, it is quite complex to implement. 254 Should we bother about this special case? 255 256 Resolution: not now 257 258 **\<Other builders\>** 259 We don't know how to obtain a mapping of local paths to remote paths for other builders (jib, bazel). 260 This may even have to be implemented upstream. 261 Until we can support those builders, we to handle the case when a user tries to use inference with those builders. 262 263 - Option 1: bail out with an error 264 - Option 2: copy the file into the container with the destination set to the relative path wrt the context directory. 265 266 Resolution: Option 1, ideally during schema validation. 267 268 **\<Upgrade hint\>** 269 The path inference allows to simplify the sync specification considerably. 270 Besides, subdirectory syncing makes the sync functionality even feasible for large projects that need to sync many folders into the container. 271 Therefore, we could consider to advertise this new functionality when running Skaffold or when an old-style sync rule as found. 272 273 Resolution: Agree. We should support auto-fixing. 274 275 **\<SyncMap signature\>** 276 The new `SyncMap` interface provides some sync rules from the builder. 277 However, the suggested signature is `func() map[string]string` with the convention to not flatten directories. 278 To be more in line with the other sync functionality, it could also return `[]SyncRule`. 279 Although being clearer, this has the downside of introducing a dependency from the builder package to the pipeline config package. 280 If going that route, we should probably duplicate this struct in a proper place. 281 However, is it necessary at all, or is the current signature good enough? 282 283 Resolution: This is an implementation detail. 284 285 ## Implementation plan 286 287 1. Upgrade `artifact.sync` to the new sync rule structure and test schema validation. 288 Should already support multiple destination paths. (#1847) 289 2. Add inference logic for docker. (#2084) 290 3. Support support sync rules with inference. (former #1812, will become separate PR) 291 - includes integration tests 292 - includes doc update 293 - includes example showcase 294 4. (out of scope) Smart sync mode by allowing builders to specify default sync rules. 295 - Support smart sync for jib. 296 - Possibly support smart sync for Dockerfile when `ADD` or `COPY` are not followed by a `RUN` command. 297 5. (future) Add sync map logic for jib and examples. 298 6. (future) Add sync map logic for bazel and examples. 299 300 ## Integration test plan 301 302 - **implementation step 3** Change one of the existing sync examples so that it uses inference (e.g. #1826). 303 The test should change a local source file. Expect that the change will be reflected in the container. 304 305 - **implementation step 3** Set up automatic destination syncing and delete a local input file. 306 Expect that the input file is also deleted in the container. 307 _Update_: This expectation cannot be met, because a deleted file is no longer contained in the inferred syncmap. Thus file deletion with inferred sync mode must trigger a rebuild. 308 309 - **implementation step 4** Add a test case that that features builder plugin sync patterns. 310 311 ## Glossary 312 313 - _sync map_: specifies a mapping of source files to possibly multiple destinations in the container. It does not specify the intent to sync. 314 - _sync rule_: specifies the intent to sync files. For manual sync, it must also specify the destination of sync'ed files.