github.com/GoogleContainerTools/skaffold/v2@v2.13.2/docs-v2/design_proposals/sync-improvements.md (about)

     1  # Sync improvements
     2  
     3  * Author(s): Cornelius Weig (@corneliusweig)
     4  * Design Shepherd: Tejal Desai (@tejal29)
     5  * Date: 03/20/2019
     6  * Status: Reviewed
     7  
     8  ## Background
     9  
    10  Currently, skaffold config supports `artifact.sync` as a way to sync files
    11  directly to pods. So far, artifact sync requires a specification of sync
    12  patterns like
    13  
    14  ```yaml
    15  sync:
    16    '*.js': app/
    17  ```
    18  
    19  This is error prone and unnecessarily hard to use because the destination can often be determined by the builder. For example a destination path is
    20  already contained in the Dockerfile for docker build. (see #1166, #1581).
    21  In addition, the syncing needs to handle special cases for globbing and often
    22  requires a long list of sync patterns (#1807).
    23  
    24  Furthermore, builders should be able to inform Skaffold about additional sync paths (#1704).
    25  This will result in much faster deploy cycles, without users having to bother about these optimizations.
    26  
    27  #### Problems with existing syntax
    28  The current `sync` config syntax has the following problems:
    29  
    30  - The flattening of the file structure at the destination is surprising. For example:
    31    ```yaml
    32    sync:
    33      'src/**/*.js': dest/
    34    ```
    35    will put all `.js` files in the `dest` folder.
    36  - The triple-star syntax is is quite implicit and a surprising extension of the standard globbing syntax. For example:
    37    ```yaml
    38    sync:
    39      'src/b/c/***/*.py': dest/
    40    ```
    41    will recreate subfolders _below_ `src/b/c` at `dest/` and sync `.py` files.
    42    The triple-star therefore triggers two actions which should not depend on each other:
    43    - do not flatten the directory structure
    44    - strip everything on the source path up to `***`
    45  - The current syntax does not allow extension to further sync modes unless further magic strings are introduced.
    46    
    47  The syntax should therefore be revised.
    48  
    49  ## Design
    50  
    51  Skaffold sync shall have three different modes:
    52  
    53  1. _manual_: the user specifies both the source and destination.
    54  3. _auto-inferred destination_: the user specifies the syncable files, and Skaffold infers their destination in the container, e.g based on Dockerfile
    55  2. _smart_: builder recommended sync rules for files, e.g. jib specifies what to sync and provides both the source and dest of files to sync. The user does not have to specify anything.
    56  
    57  The scope of this design proposal is the _direct_ and _inferred_ mode.
    58  The smart sync mode is out of scope for this document but may be mentioned at times.
    59  
    60  ### Goals
    61  The sync mechanism in skaffold should be able to infer the destination location of local files, when the user opts-in to do so.
    62  
    63  For example: given the following Dockerfile
    64  
    65  ```dockerfile
    66  FROM nginx
    67  WORKDIR /var
    68  COPY static *.html www/
    69  ADD nginx.conf .
    70  ADD nginx.conf /etc/
    71  ```
    72  
    73  The sync locations should be inferred as
    74  - `static/page1.html` -> `/var/www/page1.html` (! no static folder at destination !)
    75  - `static/assets/style.css` -> `/var/www/assets/style.css`
    76  - `index.html` -> `/var/www/index.html`
    77  - `nginx.conf` -> `/var/nginx.conf`, `/etc/nginx.conf`
    78  
    79  A fundamental change compared to the current sync mechanism is that one source path may by sync'ed to multiple destinations in the future.
    80  
    81  ### Assumptions
    82  
    83  Builders know where to put static files in the container.
    84  This information needs to be pulled from the builders and made available in Skaffold.
    85  Currently, we know how to do this for Docker artifacts (@r2d4 suggested this in #1166).
    86  Jib transforms some input files and includes others as static resources.
    87  The specific sync maps need to be detailed out but are straightforward.
    88  Whether the bazel builder supports this, is unclear at the moment.
    89  
    90  ### Interface changes
    91  
    92  The builder implementations need to fulfill a new requirement:
    93  They need to provide a default sync map for changed or deleted files.
    94  For that, the `Builder` interface (pkg/skaffold/build/build.go) will receive the new method `SyncMap`.
    95  
    96  ```go
    97  type Builder interface {
    98     Labels() map[string]string
    99  
   100     Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]Artifact, error)
   101  
   102     DependenciesForArtifact(ctx context.Context, artifact *latest.Artifact) ([]string, error)
   103  
   104     // ADDED
   105     SyncMap(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error)
   106  }
   107  ```
   108  
   109  `SyncMap` returns a map whose keys are paths of source files for the container.
   110  These host paths will be relative to the workingdir of the respective artifact.
   111  The map values list all absolute remote destination paths where the input files are placed in the container.
   112  This must be a list, because a single input file may be copied to several locations in the container.
   113  Note that this does not mean that every file will be sync'ed.
   114  The intent to sync a file needs to be explicitly specified by user-provided sync rules (see config changes).
   115  In the future, some builders may specify default sync rules, but this is out of scope for this document.
   116  
   117  For example, given the above Dockerfile, `SyncMap` will return something like
   118  ```go
   119  map[string][]string{
   120    "static/page1.html": []string{"/var/www/page1.html"},
   121    "static/assets/style.css": []string{"/var/www/assets/style.css"},
   122    "index.html": []string{"/var/www/index.html"},
   123    "nginx.conf": []string{"/var/nginx.conf", "/etc/nginx.conf"},
   124  }
   125  ```
   126  
   127  Unsupported builders will return a not-supported error.
   128  
   129  ### Config changes
   130  
   131  The `artifact.sync` config needs to support additional requirements:
   132  
   133  - Support existing use-cases for backwards compatibility, i.e.
   134    - Glob patterns on the host path
   135    - Transplant a folder structure, ~~or flatten the folder structure~~[this behavior was a bug and will be removed]
   136  - Offer a way to use destination inference.
   137  - Provide a concise and unambiguous configuration syntax to avoid the problems outlined [above](#Background).
   138  - Be open for extension, such as the `smart` sync mode.
   139  
   140  #### Suggestion
   141  The `artifact.sync` config will be upgraded as follows:
   142  
   143  ```yaml
   144  sync:
   145    # Existing use-case: Copy all js under 'src' into 'dest', re-creating the directory structure below 'src'
   146    # This corresponds to the current `'src/***/*.js': app/`
   147    manual:         # oneOf=sync
   148    - src: 'src/**/*.js' # required
   149      dest: app/         # required
   150      strip: 'src/'      # optional, default to ""
   151  
   152    # New use-case: Copy all python files and infer their destination.
   153    infer:          # oneOf=sync
   154    - '**/*.py'
   155      
   156    # New use-case: smart sync mode for jib (out of scope)
   157    smart: {}       # oneOf=sync
   158  ```
   159  
   160  In the `sync` config, exactly one of the modes `manual` or `infer` may be specified, but not both.
   161  Both `manual` and `infer` contain a list of sync rules.
   162  When determining the destination for a given file, all sync rules will be considered and not just the first match.
   163  Thus, a single file may be copied to multiple destinations.
   164  
   165  Error cases:
   166  - It should be an error to specify `strip` with a prefix which which does not match the static prefix of the `from` field.
   167    ```yaml
   168    - src: 'src/**/*.js'
   169      dest: dest/
   170      strip: src/sub        # ERR
   171    - src: 'src/**/*.js'
   172      dest: dest/
   173      strip: app/           # ERR
   174    ```
   175  
   176  Open questions about the configuration:
   177  
   178  **\<Is the syntax clear enough?\>**
   179  Is the syntax clear?
   180  Is the configuration intuitive to use?
   181  
   182  Resolution:  YES
   183  
   184  **\<Does it improve clarity to have the `inferTo` field?\>**
   185  A different approach could be to fall back to inference, if the `to` field is left blank.
   186  This would have the advantage that there needs to be no validation about having `to` and `inferTo` in the same rule, which should cause an error.
   187  
   188  Resolution: Obsolete, the improved version excludes this error case.
   189  
   190  **\<Error for `strip` and `flatten`\>**
   191  Given the new sync config, users may try to combine various flags in unintended ways.
   192  For example, should `strip` and `flatten` in the same rule always be an error?
   193  My intuition would say, that `strip+flatten` is the same as `flatten`, but this should to be discussed.
   194  
   195  Resolution: Obsolete.
   196  
   197  **\<Do we need an `excludes` rule?\>**
   198  The current config only suggests additive rules. For some applications it might be easier to also remove matched host paths from the syncable set. For example:
   199  ```yaml
   200  sync:
   201  # include all files below src...
   202  - src: 'src/**/*'
   203    dest: app/
   204  # ...but do not consider js files for sync
   205  - src: 'src/**/*.js'
   206    exclude: true
   207  ```
   208  Also see #1766.
   209  
   210  Resolution: Descoped, open separate issue
   211  
   212  
   213  #### Migration plan
   214  An automatic schema upgrade shall be implemented.
   215  The former default behavior to flatten directories at the destination will no longer be supported.
   216  This is possible, because sync is still an alpha feature.
   217  Users who are using incompatible sync patterns will receive a warning during upgrade.
   218  
   219  ### Open Issues/Question
   220  
   221  **\<Overlapping copies\>**
   222  Overlapping copies can be caused by at least two ways:
   223  
   224  1. The `COPY/ADD` instructions in the Dockerfile may copy to the same destination. For example:
   225     ```dockerfile
   226     COPY foo /foo
   227     COPY bar/ /   # bar contains a file called "foo"
   228     ```
   229  2. Two sync rules may copy files to the same destination. For example:
   230     ```yaml
   231     sync:
   232     - src: foo
   233       dest: /baz
   234     - src: bar
   235       dest: /baz
   236     ```
   237  
   238  Should such cases be treated as errors, warnings, or not be detected at all?
   239  
   240  Resolution: No detection which is the current behavior.
   241  
   242  **\<Transitive copies\>**
   243  Multi-stage Dockerfiles can lead to quite contrived scenarios. For example:
   244  ```dockerfile
   245  FROM scratch
   246  ADD foo baz
   247  FROM scratch
   248  COPY --from=0 baz bar
   249  ```
   250  Ideally, the sync destination should be inferred as `foo -> bar`.
   251  
   252  This is quite an edge case and can easily be circumvented.
   253  However, it is quite complex to implement.
   254  Should we bother about this special case?
   255  
   256  Resolution: not now
   257  
   258  **\<Other builders\>**
   259  We don't know how to obtain a mapping of local paths to remote paths for other builders (jib, bazel).
   260  This may even have to be implemented upstream.
   261  Until we can support those builders, we to handle the case when a user tries to use inference with those builders.
   262  
   263  - Option 1: bail out with an error
   264  - Option 2: copy the file into the container with the destination set to the relative path wrt the context directory.
   265  
   266  Resolution: Option 1, ideally during schema validation.
   267  
   268  **\<Upgrade hint\>**
   269  The path inference allows to simplify the sync specification considerably.
   270  Besides, subdirectory syncing makes the sync functionality even feasible for large projects that need to sync many folders into the container.
   271  Therefore, we could consider to advertise this new functionality when running Skaffold or when an old-style sync rule as found.
   272  
   273  Resolution: Agree. We should support auto-fixing.
   274  
   275  **\<SyncMap signature\>**
   276  The new `SyncMap` interface provides some sync rules from the builder.
   277  However, the suggested signature is `func() map[string]string` with the convention to not flatten directories.
   278  To be more in line with the other sync functionality, it could also return `[]SyncRule`.
   279  Although being clearer, this has the downside of introducing a dependency from the builder package to the pipeline config package.
   280  If going that route, we should probably duplicate this struct in a proper place.
   281  However, is it necessary at all, or is the current signature good enough?
   282  
   283  Resolution: This is an implementation detail.
   284  
   285  ## Implementation plan
   286  
   287  1. Upgrade `artifact.sync` to the new sync rule structure and test schema validation.
   288     Should already support multiple destination paths. (#1847)
   289  2. Add inference logic for docker. (#2084)
   290  3. Support support sync rules with inference. (former #1812, will become separate PR)
   291     - includes integration tests
   292     - includes doc update
   293     - includes example showcase
   294  4. (out of scope) Smart sync mode by allowing builders to specify default sync rules.
   295     - Support smart sync for jib.
   296     - Possibly support smart sync for Dockerfile when `ADD` or `COPY` are not followed by a `RUN` command.
   297  5. (future) Add sync map logic for jib and examples.
   298  6. (future) Add sync map logic for bazel and examples.
   299  
   300  ## Integration test plan
   301  
   302  - **implementation step 3** Change one of the existing sync examples so that it uses inference (e.g. #1826).
   303    The test should change a local source file. Expect that the change will be reflected in the container.
   304  
   305  - **implementation step 3** Set up automatic destination syncing and delete a local input file.
   306    Expect that the input file is also deleted in the container.
   307    _Update_: This expectation cannot be met, because a deleted file is no longer contained in the inferred syncmap. Thus file deletion with inferred sync mode must trigger a rebuild.
   308  
   309  - **implementation step 4** Add a test case that that features builder plugin sync patterns.
   310  
   311  ## Glossary
   312  
   313  - _sync map_: specifies a mapping of source files to possibly multiple destinations in the container. It does not specify the intent to sync.
   314  - _sync rule_: specifies the intent to sync files. For manual sync, it must also specify the destination of sync'ed files.