github.com/wmuizelaar/kpt@v0.0.0-20221018115725-bd564717b2ed/docs/design-docs/03-pipeline-merge.md (about)

     1  # Pipeline Merge
     2  
     3  * Author(s): Phani Teja Marupaka, phanimarupaka
     4  * Approver: Sunil Arora, Mike Borozdin
     5  
     6  ## Why
     7  
     8  Currently, `kpt pkg update` doesn't merge the `pipeline` section in the Kptfile as expected. 
     9  The fact that the `pipeline` section is a non-associative list with no defined function identity makes 
    10  it very difficult to merge with upstream counterparts. Ordering of the functions is also important.
    11  This friction is forcing users to use `setters` and discouraging them from declaring other functions in the `pipeline` as 
    12  they will be deleted during the `kpt pkg update`. [Here](https://github.com/GoogleContainerTools/kpt/issues/2529) 
    13  is the example issue. Merging the pipeline correctly will reduce huge amounts of 
    14  friction in declaring new functions. This will encourage users to declare more functions 
    15  in the pipeline which in turn helps to **avoid excessive parameterization**.
    16  
    17  Consider the example of [Landing Zone](https://github.com/GoogleCloudPlatform/blueprints/tree/main/catalog) blueprints. 
    18  Parameters(setters) are the primary interface for the package. This is an anti-pattern for package-as-interface motivation 
    19  and one of the major blockers for not using other functions is the merge behavior 
    20  of the pipeline. If this problem is solved, LZ maintainers can rewrite the packages 
    21  with best practices aligned to the bigger goal of treating the package-as-interface, 
    22  discourage excessive use of setters and only use them as parameterization techniques of last resort.
    23  
    24  ## Design
    25  
    26  In order to solve this problem, we should merge the pipeline section of the Kptfile 
    27  in a custom manner based on the most common use-cases and the expected outputs in 
    28  such scenarios. There are no user interface changes. Users will invoke the `kpt pkg 
    29  update` command in the same way they do currently. This effort will improve the merged 
    30  output of the pipeline section.
    31  
    32  **Is this change backwards compatible**: We are not making any changes to the api, 
    33  we are only improving the merged output of the `pipeline` section. This change will 
    34  produce a different output of Kptfile when compared to the current version but this 
    35  is not a breaking change.
    36  
    37  ## User guide
    38  
    39  Here is what users can expect when they invoke the command, `kpt pkg update` on a package.
    40  
    41  #### Terminology
    42  
    43  **Original**: Source of the last fetched version of the package, represented by the upstreamLock section in Kptfile.
    44  
    45  **Updated upstream**: Declared source of the updated package, represented by the upstream section in Kptfile.
    46  
    47  **Local**: Local fork of the package on disk.
    48  
    49  Firstly, we need to define the identity of the function in order to uniquely identify 
    50  a function across three sources to perform a 3-way merge. We will introduce a 
    51  new field `name` for identifying functions and merge pipeline as associative list.
    52  The aim is to encourage users eventually to have `name` field specified for all functions
    53  similar to containers in deployment. But in the meanwhile, we will be using image name in 
    54  order to identify the function and make it an associative list for merging. The limitation
    55  of this approach is the multiple functions with same image can be declared in the pipeline.
    56  In that case we will fall back to current merge logic of merging pipeline as non-associative list.
    57  In order to have deterministic behavior, we will use image as merge key iff none of the 
    58  functions have `name` field specified and iff there are no duplicate image names declare in functions.
    59  
    60  So the algorithm to merge:
    61  
    62  1. If `name` field is specified in all the functions across all sources, merge 
    63  it as associative list using `name` field as merge key.
    64  2. If `name` field is not specified in all the functions across all sources,
    65  and if there is no duplicate declaration of image names for functions, use `image` value
    66  (excluding the version) as merge key and merge it as associative list.
    67  3. In all other cases fall back to the current merge behavior.
    68  
    69  Here is an example of the merging apply-setters function when new setters are added 
    70  upstream and existing setter values are updated locally. This is the most common 
    71  use case where kpt fails to merge them. Since `name` field is not specified, `image`
    72  value is used to identify and merge the function
    73  
    74  ```
    75  Original
    76  ```
    77  ```yaml
    78  pipeline:
    79    mutators:
    80      - image: gcr.io/kpt-fn/apply-setters:v0.1
    81        configMap:
    82          image: nginx
    83          tag: 1.0.1
    84  ```
    85  ```
    86  Updated upstream
    87  ```
    88  ```yaml
    89  pipeline:
    90    mutators:
    91      - image: gcr.io/kpt-fn/apply-setters:v0.1
    92        configMap:
    93          image: nginx
    94          tag: 1.0.1
    95          new-setter: new-setter-value // new setter is added
    96  ```
    97  ```
    98  Local
    99  ```
   100  
   101  ```yaml
   102  pipeline:
   103    mutators:
   104      - image: gcr.io/kpt-fn/apply-setters:v0.1
   105        configMap:
   106          image: nginx
   107          tag: 1.2.0 // value of tag is updated
   108  ```
   109  ```
   110  Current Output
   111  ```
   112  ```yaml
   113  pipeline:
   114    mutators:
   115      - image: gcr.io/kpt-fn/apply-setters:v0.1
   116        configMap:
   117          image: nginx
   118          tag: 1.0.1 // entire pipeline is overridden by upstream 
   119          new-setter: new-setter-value
   120  ```
   121  ```
   122  Expected Output
   123  ```
   124  ```yaml
   125  pipeline:
   126    mutators:
   127      - image: gcr.io/kpt-fn/apply-setters:v0.1
   128        configMap:
   129          image: nginx
   130          tag: 1.2.0 // updated tag is preserved
   131          new-setter: new-setter-value // new setter is pulled
   132  ```
   133  
   134  In the above scenario, the value of the setter tag is not updated upstream, so 
   135  the modified local value will be preserved. But in the following example, both 
   136  upstream and local values change. So, similar to merging resources, upstream 
   137  value wins if the same fields in both upstream and local are updated.
   138  
   139  ```
   140  Original
   141  ```
   142  ```yaml
   143  pipeline:
   144    mutators:
   145      - image: gcr.io/kpt-fn/set-labels:v0.1
   146        configPath: labels.yaml
   147  ```
   148  ```
   149  Updated upstream
   150  ```
   151  ```yaml
   152  pipeline:
   153    mutators:
   154      - image: gcr.io/kpt-fn/set-labels:v0.1
   155        configPath: labels-updated.yaml // upstream value changed
   156  ```
   157  ```
   158  Local
   159  ```
   160  ```yaml
   161  pipeline:
   162    mutators:
   163      - image: gcr.io/kpt-fn/set-labels:v0.1
   164        configPath: labels-local.yaml // local value changed
   165  ```
   166  ```
   167  Expected Output
   168  ```
   169  ```yaml
   170  pipeline:
   171    mutators:
   172      - image: gcr.io/kpt-fn/set-labels:v0.1
   173        configPath: labels-updated.yaml // upstream overrides local 
   174  ```
   175  
   176  Similarly, the upstream version wins if both upstream and local are updated, else local is preserved.
   177  
   178  ```
   179  Original
   180  ```
   181  ```yaml
   182  pipeline:
   183    mutators:
   184      - image: gcr.io/kpt-fn/set-annotations:v0.1
   185        configPath: annotations.yaml
   186  ```
   187  ```
   188  Updated upstream
   189  ```
   190  ```yaml
   191  pipeline:
   192    mutators:
   193      - image: gcr.io/kpt-fn/set-annotations:v0.2.0
   194        configPath: annotations.yaml
   195  ```
   196  ```
   197  Local
   198  ```
   199  ```yaml
   200  pipeline:
   201    mutators:
   202      - image: gcr.io/kpt-fn/set-annotations:v0.1.1
   203        configPath: annotations.yaml
   204  ```
   205  ```
   206  Expected Output
   207  ```
   208  ```yaml
   209  pipeline:
   210    mutators:
   211      - image: gcr.io/kpt-fn/set-annotations:v0.2.0
   212        configPath: annotations.yaml
   213  ```
   214  
   215  #### More examples with expected output
   216  
   217  Newly added upstream function.
   218  
   219  ```
   220  Original
   221  ```
   222  ```yaml
   223  pipeline:
   224    mutators:
   225      - image: gcr.io/kpt-fn/apply-setters:v0.1
   226        configPath: setters.yaml
   227  ```
   228  ```
   229  Updated upstream
   230  ```
   231  ```yaml
   232  pipeline:
   233    mutators:
   234      - image: gcr.io/kpt-fn/apply-setters:v0.1
   235        configPath: setters.yaml
   236      - image: gcr.io/kpt-fn/generate-folders:v0.1
   237  ```
   238  ```
   239  Local
   240  ```
   241  ```yaml
   242  pipeline:
   243    mutators:
   244      - image: gcr.io/kpt-fn/apply-setters:v0.1
   245        configPath: setters.yaml
   246      - image: gcr.io/kpt-fn/set-namespace:v0.1
   247        configMap:
   248          namespace: foo
   249  ```
   250  ```
   251  Expected output
   252  ```
   253  ```yaml
   254  pipeline:
   255    mutators:
   256      - image: gcr.io/kpt-fn/apply-setters:v0.1
   257        configPath: setters.yaml
   258      - image: gcr.io/kpt-fn/set-namespace:v0.1
   259        configMap:
   260          namespace: foo
   261      - image: gcr.io/kpt-fn/generate-folders:v0.1
   262  ```
   263  
   264  If a function is deleted upstream and not changed on the local, it will be deleted on local.
   265  
   266  ```
   267  Original
   268  ```
   269  ```yaml
   270  pipeline:
   271    mutators:
   272      - image: gcr.io/kpt-fn/apply-setters:v0.1
   273        configPath: setters.yaml
   274      - image: gcr.io/kpt-fn/generate-folders:v0.1
   275  ```
   276  ```
   277  Updated upstream
   278  ```
   279  ```yaml
   280  pipeline:
   281    mutators:
   282      - image: gcr.io/kpt-fn/apply-setters:v0.1
   283        configPath: setters.yaml
   284  ```
   285  ```
   286  Local
   287  ```
   288  ```yaml
   289  pipeline:
   290    mutators:
   291      - image: gcr.io/kpt-fn/apply-setters:v0.1
   292        configPath: setters.yaml
   293      - image: gcr.io/kpt-fn/generate-folders:v0.1
   294      - image: gcr.io/kpt-fn/set-namespace:v0.1
   295        configMap:
   296          namespace: foo
   297  ```
   298  ```
   299  Expected output
   300  ```
   301  ```yaml
   302  pipeline:
   303    mutators:
   304      - image: gcr.io/kpt-fn/apply-setters:v0.1
   305        configPath: setters.yaml
   306      - image: gcr.io/kpt-fn/set-namespace:v0.1
   307        configMap:
   308          namespace: foo
   309  ```
   310  
   311  Same function declared multiple times: If the same function is declared multiple 
   312  times and `name` field is not specified, fall back to the current merge behavior
   313  
   314  ```
   315  Original
   316  ```
   317  ```yaml
   318  pipeline:
   319    mutators:
   320      - image: gcr.io/kpt-fn/search-replace:v0.1
   321        configMap:
   322          by-value: foo
   323          put-value: bar
   324      - image: gcr.io/kpt-fn/search-replace:v0.1
   325        configMap:
   326          by-value: abc
   327          put-comment: ${some-setter-name}
   328  ```
   329  ```
   330  Updated upstream
   331  ```
   332  ```yaml
   333  pipeline:
   334    mutators:
   335      - image: gcr.io/kpt-fn/search-replace:v0.1
   336        configMap:
   337          by-value: foo
   338          put-value: bar-new
   339      - image: gcr.io/kpt-fn/search-replace:v0.1
   340        configMap:
   341          by-value: abc
   342          put-comment: ${updated-setter-name}
   343  ```
   344  ```
   345  Local
   346  ```
   347  ```yaml
   348  pipeline:
   349    mutators:
   350      - image: gcr.io/kpt-fn/generate-folders:v0.1
   351      - image: gcr.io/kpt-fn/search-replace:v0.1
   352        configMap:
   353          by-value: foo
   354          put-value: bar
   355      - image: gcr.io/kpt-fn/set-labels:v0.1
   356        configMap:
   357          app: db
   358      - image: gcr.io/kpt-fn/search-replace:v0.1
   359        configMap:
   360          by-value: abc
   361          put-comment: ${some-setter-name}
   362      - image: gcr.io/kpt-fn/search-replace:v0.1
   363        configMap:
   364          by-value: YOUR_TEAM
   365          put-value: my-team
   366  ```
   367  ```
   368  Expected output
   369  ```
   370  ```yaml
   371  pipeline:
   372    mutators:
   373    - image: gcr.io/kpt-fn/search-replace:v0.1
   374      configMap:
   375        by-value: foo
   376        put-value: bar-new
   377    - image: gcr.io/kpt-fn/search-replace:v0.1
   378      configMap:
   379        by-value: abc
   380        put-comment: ${updated-setter-name}
   381  ```
   382  
   383  In the following scenario, only one function has the `name` field specified. All the
   384  other functions doesn't have name specified, hence we fall back to current merge logic.
   385  
   386  ```
   387  Original
   388  ```
   389  ```yaml
   390  pipeline:
   391    mutators:
   392      - image: gcr.io/kpt-fn/search-replace:v0.1
   393        configMap:
   394          by-value: foo
   395          put-value: bar
   396      - image: gcr.io/kpt-fn/search-replace:v0.1
   397        configMap:
   398          by-value: abc
   399          put-comment: ${some-setter-name}
   400  ```
   401  ```
   402  Updated upstream
   403  ```
   404  ```yaml
   405  pipeline:
   406    mutators:
   407      - image: gcr.io/kpt-fn/search-replace:v0.1
   408        configMap:
   409          by-value: foo
   410          put-value: bar-new
   411      - image: gcr.io/kpt-fn/search-replace:v0.1
   412        configMap:
   413          by-value: abc
   414          put-comment: ${updated-setter-name}
   415  ```
   416  ```
   417  Local
   418  ```
   419  ```yaml
   420  pipeline:
   421    mutators:
   422      - image: gcr.io/kpt-fn/search-replace:v0.1
   423        name: my-new-function
   424        configMap:
   425          by-value: YOUR_TEAM
   426          put-value: my-team
   427      - image: gcr.io/kpt-fn/generate-folders:v0.1
   428      - image: gcr.io/kpt-fn/search-replace:v0.1
   429        configMap:
   430          by-value: foo
   431          put-value: bar
   432      - image: gcr.io/kpt-fn/set-labels:v0.1
   433        configMap:
   434          app: db
   435      - image: gcr.io/kpt-fn/search-replace:v0.1
   436        configMap:
   437          by-value: abc
   438          put-comment: ${some-setter-name}
   439  ```
   440  ```
   441  Expected output
   442  ```
   443  ```yaml
   444  pipeline:
   445    mutators:
   446    - image: gcr.io/kpt-fn/search-replace:v0.1
   447      configMap:
   448        by-value: foo
   449        put-value: bar-new
   450    - image: gcr.io/kpt-fn/search-replace:v0.1
   451      configMap:
   452        by-value: abc
   453        put-comment: ${updated-setter-name}
   454  ```
   455  
   456  Here is the ideal scenario which leads to deterministic merge behavior using `name`
   457  field across all sources
   458  
   459  ```
   460  Original
   461  ```
   462  ```yaml
   463  pipeline:
   464    mutators:
   465      - image: gcr.io/kpt-fn/search-replace:v0.1
   466        name: sr1
   467        configMap:
   468          by-value: foo
   469          put-value: bar
   470      - image: gcr.io/kpt-fn/search-replace:v0.1
   471        name: sr2
   472        configMap:
   473          by-value: abc
   474          put-comment: ${some-setter-name}
   475  ```
   476  ```
   477  Updated upstream
   478  ```
   479  ```yaml
   480  pipeline:
   481    mutators:
   482      - image: gcr.io/kpt-fn/search-replace:v0.1
   483        name: sr1
   484        configMap:
   485          by-value: foo
   486          put-value: bar-new
   487      - image: gcr.io/kpt-fn/search-replace:v0.1
   488        name: sr2
   489        configMap:
   490          by-value: abc
   491          put-comment: ${updated-setter-name}
   492  ```
   493  ```
   494  Local
   495  ```
   496  ```yaml
   497  pipeline:
   498    mutators:
   499      - image: gcr.io/kpt-fn/search-replace:v0.1
   500        name: my-new-function
   501        configMap:
   502          by-value: YOUR_TEAM
   503          put-value: my-team
   504      - image: gcr.io/kpt-fn/generate-folders:v0.1
   505        name: gf1
   506      - image: gcr.io/kpt-fn/search-replace:v0.1
   507        name: sr1
   508        configMap:
   509          by-value: foo
   510          put-value: bar
   511      - image: gcr.io/kpt-fn/set-labels:v0.1
   512        name: sl1
   513        configMap:
   514          app: db
   515      - image: gcr.io/kpt-fn/search-replace:v0.1
   516        name: sr2
   517        configMap:
   518          by-value: abc
   519          put-comment: ${some-setter-name}
   520  ```
   521  ```
   522  Expected output
   523  ```
   524  ```yaml
   525  pipeline:
   526    mutators:
   527    - image: gcr.io/kpt-fn/search-replace:v0.1
   528      name: my-new-function
   529      configMap:
   530        by-value: YOUR_TEAM
   531        put-value: my-team
   532    - image: gcr.io/kpt-fn/generate-folders:v0.1
   533      name: gf1
   534    - image: gcr.io/kpt-fn/search-replace:v0.1
   535      name: sr1
   536      configMap:
   537        by-value: foo
   538        put-value: bar-new
   539    - image: gcr.io/kpt-fn/set-labels:v0.1
   540      name: sl1
   541      configMap:
   542        app: db
   543    - image: gcr.io/kpt-fn/search-replace:v0.1
   544      name: sr2
   545      configMap:
   546        by-value: abc
   547        put-comment: ${updated-setter-name}
   548  ```
   549  
   550  Merging selectors is difficult as there is no identity. If both upstream and
   551  local selectors for a given function diverge, the entire section of selectors
   552  from upstream will override the selectors on local for that function.
   553  
   554  ```
   555  Origin
   556  ```
   557  ```yaml
   558  pipeline:
   559    mutators:
   560      - image: gcr.io/kpt-fn/ensure-name-substring:v0.1
   561        selectors:
   562          - kind: Deployment
   563            name: wordpress
   564          - kind: Service
   565            name: wordpress
   566  ```
   567  ```
   568  Updated upstream
   569  ```
   570  ```yaml
   571  pipeline:
   572    mutators:
   573      - image: gcr.io/kpt-fn/ensure-name-substring:v0.1
   574        selectors:
   575          - kind: Deployment
   576            name: wordpress
   577          - kind: Service
   578            name: wordpress
   579          - kind: Foo
   580            name: wordpress
   581  ```
   582  ```
   583  Local
   584  ```
   585  ```yaml
   586  pipeline:
   587    mutators:
   588      - image: gcr.io/kpt-fn/ensure-name-substring:v0.1
   589        selectors:
   590          - kind: Deployment
   591            name: my-wordpress
   592          - kind: Service
   593            name: my-wordpress
   594          - namespace: my-space
   595  ```
   596  ```
   597  Expected output
   598  ```
   599  ```yaml
   600  pipeline:
   601    mutators:
   602      - image: gcr.io/kpt-fn/apply-setters:v0.1
   603        configPath: setters.yaml
   604      - image: gcr.io/kpt-fn/set-namespace:v0.1
   605        configMap:
   606          namespace: foo
   607      - image: gcr.io/kpt-fn/generate-folders:v0.1
   608  ```
   609  
   610  ## Open issues
   611  
   612   https://github.com/GoogleContainerTools/kpt/issues/2529
   613  
   614  ## Alternatives Considered
   615  
   616  For identifying the function, we can add the function version to the primary key
   617  (in addition to the function name). But it is highly likely that 
   618  changing the function version means updating the function as opposed to adding a new function.
   619  
   620  Why should upstream win in case of conflicts ? Is this what the user always expects?
   621  - Not necessarily. User expectations might be different in different scenarios for resolving conflicts. 
   622  But since we already went down the path of upstream-wins strategy in case of conflicts for merging resources, 
   623  we are going down that route to maintain consistency.
   624  - There is an open [issue](https://github.com/GoogleContainerTools/kpt/issues/1437) to support 
   625  multiple conflict resolution strategies and provide interactive update behavior to resolve 
   626  conflicts which is out of scope for this doc and will be addressed soon.