github.com/wmuizelaar/kpt@v0.0.0-20221018115725-bd564717b2ed/docs/design-docs/07-package-orchestration.md (about) 1 # Package Orchestration 2 3 * Author(s): Martin Maly, @martinmaly 4 * Approver: @mortent 5 6 ## Why 7 8 Customers who want to take advantage of the benefits of [Configuration as Data 9 ](./06-config-as-data.md) can do so today using a [kpt](https://kpt.dev) CLI and 10 kpt function ecosystem, including [functions catalog](https://catalog.kpt.dev/). 11 Package authoring is possible using a variety of editors with 12 [YAML](https://yaml.org/) support. That said, a delightful UI experience 13 of WYSIWYG package authoring which supports broader package lifecycle, including 14 package authoring with *guardrails*, approval workflow, package deployment, and 15 more, is not yet available. 16 17 *Package Orchestration* service is part of the implementation of the 18 Configuration as Data approach, and enables building the delightful UI 19 experience supporting the configuration lifecycle. 20 21 ## Core Concepts 22 23 This section briefly describes core concepts of package orchestration: 24 25 ***Package***: Package is a collection of related configuration files containing 26 configuration of [KRM][krm] **resources**. Specifically, configuration 27 packages are [kpt packages](https://kpt.dev/). 28 29 ***Repository***: Repositories store packages or [functions][]. 30 For example [git][] or [OCI](#oci). Functions may be associated with 31 repositories to enforce constraints or invariants on packages (guardrails). 32 ([more details](#repositories)) 33 34 Packages are sequentially ***versioned***; multiple versions of the same package 35 may exist in a repository. [more details](#package-versioning)) 36 37 A package may have a link (URL) to an ***upstream package*** (a specific 38 version) from which it was cloned. ([more details](#package-relationships)) 39 40 Package may be in one of several lifecycle stages: 41 * ***Draft*** - package is being created or edited. The package contents can be 42 modified but package is not ready to be used (i.e. deployed) 43 * ***Proposed*** - author of the package proposed that the package be published 44 * ***Published*** - the changes to the package have been approved and the 45 package is ready to be used. Published packages can be deployed or cloned 46 47 ***Function*** (specifically, [KRM functions][krm functions]) can be applied to 48 packages to mutate or validate resources within them. Functions can be applied 49 to a package to create specific package mutation while editing a package draft, 50 functions can be added to package's Kptfile [pipeline][], or associated with a 51 repository to be applied to all packages on changes. 52 ([more details](#functions)) 53 54 A repository can be designated as ***deployment repository***. *Published* 55 packages in a deployment repository are considered deployment-ready. 56 ([more details](#deployment)) 57 58 <!-- Reference links --> 59 [krm]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/resource-management.md 60 [functions]: https://kpt.dev/book/02-concepts/03-functions 61 [krm functions]: https://github.com/kubernetes-sigs/kustomize/blob/master/cmd/config/docs/api-conventions/functions-spec.md 62 [pipeline]: https://kpt.dev/book/04-using-functions/01-declarative-function-execution 63 [Config Sync]: https://cloud.google.com/anthos-config-management/docs/config-sync-overview 64 [kpt]: https://kpt.dev/ 65 [git]: https://git-scm.org/ 66 [optimistic-concurrency]: https://en.wikipedia.org/wiki/Optimistic_concurrency_control 67 [apiserver]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/ 68 [representation]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#differing-representations 69 [crds]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ 70 71 ## Core Components of Configuration as Data Implementation 72 73 The Core implementation of Configuration as Data, *CaD Core*, is a set of 74 components and APIs which collectively enable: 75 76 * Registration of repositories (Git, OCI) containing kpt packages or functions, 77 and discovery of packages and functions 78 * Porcelain package lifecycle, including authoring, versioning, deletion, 79 creation and mutations of a package draft, process of proposing the package 80 draft, and publishing of the approved package. 81 * Package lifecycle operations such as: 82 * assisted or automated rollout of package upgrade when a new version 83 of the upstream package version becomes available 84 * rollback of a package to previous version 85 * Deployment of packages from deployment repositories and observability of their 86 deployment status. 87 * Permission model that allows role-based access control 88 89 ### High-Level Architecture 90 91 At the high level, the Core CaD functionality comprises: 92 93 * a generic (i.e. not task-specific) package orchestration service implementing 94 * package repository management 95 * package discovery, authoring and lifecycle management 96 * [kpt][] - a Git-native, schema-aware, extensible client-side tool for 97 managing KRM packages 98 * a GitOps-based deployment mechanism (for example [Config Sync][]), which 99 distributes and deploys configuration, and provides observability of the 100 status of deployed resources 101 * a task-specific UI supporting repository management, package discovery, 102 authoring, and lifecycle 103 104  105 106 ## CaD Concepts Elaborated 107 108 Concepts briefly introduced above are elaborated in more detail in this section. 109 110 ### Repositories 111 112 [kpt][] and [Config Sync][] currently integrate with [git][] repositories, and 113 there is an existing design to add [OCI support](./02-oci-support.md) to kpt. 114 Initially, the Package Orchestration service will prioritize integration with 115 [git][], and support for additional repository types may be added in the future 116 as required. 117 118 Requirements applicable to all repositories include: ability to store packages, 119 their versions, and sufficient metadata associated with package to capture: 120 121 * package dependency relationships (upstream - downstream) 122 * package lifecycle state (draft, proposed, published) 123 * package purpose (base package) 124 * (optionally) even customer-defined attributes 125 126 At repository registration, customers must be able to specify details needed to 127 store packages in appropriate locations in the repository. For example, 128 registration of a Git repository must accept a branch and a directory. 129 130 Repositories may have associated guardrails - mutation and validation functions 131 that ensure and enforce requirements of all packages in the repository, 132 including gating promotion of a package to a *published* lifecycle stage. 133 134 _Note_: A user role with sufficient permissions can register a package or 135 function repository, including repositories containing functions authored by 136 the customer, or other providers. Since the functions in the registered 137 repositories become discoverable, customers must be aware of the implications of 138 registering function repositories and trust the contents thereof. 139 140 ### Package Versioning 141 142 Packages are sequentially versioned. The important requirements are: 143 144 * ability to compare any 2 versions of a package to be either "newer than", 145 equal, or "older than" relationship 146 * ability to support automatic assignment of versions 147 * ability to support [optimistic concurrency][optimistic-concurrency] of package 148 changes via version numbers 149 * simple model which easily supports automation 150 151 We plan to use a simple integer sequence to represent package versions. 152 153 ### Package Relationships 154 155 Kpt packages support the concept of ***upstream***. When a package is cloned 156 from another, the new package (called ***downstream*** package) maintains an 157 upstream link to the specific version of the package from which it was cloned. 158 If a new version of the upstream package becomes available, the upstream link 159 can be used to [update](https://kpt.dev/book/03-packages/05-updating-a-package) 160 the downstream package. 161 162 ### Deployment 163 164 The deployment mechanism is responsible for deploying configuration packages 165 from a repository and affecting the live state. Because the configuration 166 is stored in standard repositories (Git, and in the future OCI), the deployment 167 component is pluggable. By default, [Config Sync][] is the deployment mechanism 168 used by CaD Core implementation but others can be used as well. 169 170 Here we highlight some key attributes of the deployment mechanism and its 171 integration within the CaD Core: 172 173 * _Published_ packages in a deployment repository are considered ready to be 174 deployed 175 * Config Sync supports deploying individual packages and whole repositories. 176 For Git specifically that translates to a requirement to be able to specify 177 repository, branch/tag/ref, and directory when instructing Config Sync to 178 deploy a package. 179 * _Draft_ packages need to be identified in such a way that Config Sync can 180 easily avoid deploying them. 181 * Config Sync needs to be able to pin to specific versions of deployable 182 packages in order to orchestrate rollouts and rollbacks. This means it must 183 be possible to GET a specific version of a package. 184 * Config Sync needs to be able to discover when new versions are available for 185 deployment. 186 187 ### Functions 188 189 Functions, specifically [KRM functions][krm functions], are used in the CaD core 190 to manipulate resources within packages. 191 192 * Similar to packages, functions are stored in repositories. Some repositories 193 (such as OCI) are more suitable for storing functions than others (such as 194 Git). 195 * Function discovery will be aided by metadata associated with the function 196 by which the function can advertise which resources it acts on, whether the 197 function is idempotent or not, whether it is a mutator or validator, etc. 198 * Function repositories can be registered and subsequently, user can discover 199 functions from the registered repositories and use them as follows: 200 201 Function can be: 202 203 * applied imperatively to a package draft to perform specific mutation to the 204 package's resources or meta-resources (`Kptfile` etc.) 205 * registered in the package's `Kptfile` function pipeline as a *mutator* or 206 *validator* in order to be automatically run as part of package rendering 207 * registered at the repository level as *mutator* or *validator*. Such function 208 then applies to all packages in the repository and is evaluated whenever a 209 change to a package in the repository occurs. 210 211 ## Package Orchestration - Porch 212 213 Having established the context of the CaD Core components and the overall 214 architecture, the remainder of the document will focus on **Porch** - Package 215 Orchestration service. 216 217 To reiterate the role of Package Orchestration service among the CaD Core 218 components, it is: 219 220 * [Repository Management](#repository-management) 221 * [Package Discovery](#package-discovery) 222 * [Package Authoring](#package-authoring) and Lifecycle 223 224 In the following section we'll expand more on each of these areas. The term 225 _client_ used in these sections can be either a person interacting with the UI 226 such as a web application or a command-line tool, or an automated agent or 227 process. 228 229 ### Repository Management 230 231 The repository management functionality of Package Orchestration service enables 232 the client to: 233 234 * register, unregister, update registration of repositories, and discover 235 registered repositories. Git repository integration will be available first, 236 with OCI and possibly more delivered in the subsequent releases. 237 * manage repository-wide upstream/downstream relationships, i.e. designate 238 default upstream repository from which packages will be cloned. 239 * annotate repository with metadata such as whether repository contains 240 deployment ready packages or not; metadata can be application or customer 241 specific 242 * define and enforce package invariants (guardrails) at the repository level, by 243 registering mutator and/or validator functions with the repository; those 244 registered functions will be applied to packages in the repository to enforce 245 invariants 246 247 ### Package Discovery 248 249 The package discovery functionality of Package Orchestration service enables 250 the client to: 251 252 * browse packages in a repository 253 * discover configuration packages in registered repositories and sort/filter 254 based on the repository containing the package, package metadata, version, 255 package lifecycle stage (draft, proposed, published) 256 * retrieve resources and metadata of an individual package, including latest 257 version or any specific version or draft of a package, for the purpose of 258 introspection of a single package or for comparison of contents of multiple 259 versions of a package, or related packages 260 * enumerate _upstream_ packages available for creating (cloning) a _downstream_ 261 package 262 * identify downstream packages that need to be upgraded after a change is made 263 to an upstream package 264 * identify all deployment-ready packages in a deployment repository that are 265 ready to be synced to a deployment target by Config Sync 266 * identify new versions of packages in a deployment repository that can be 267 rolled out to a deployment target by Config Sync 268 * discover functions in registered repositories based on filtering criteria 269 including containing repository, applicability of a function to a specific 270 package or specific resource type(s), function metadata (mutator/validator), 271 idempotency (function is idempotent/not), etc. 272 273 ### Package Authoring 274 275 The package authoring and lifecycle functionality of the package Orchestration 276 service enables the client to: 277 278 * Create a package _draft_ via one of the following means: 279 * an empty draft 'from scratch' (equivalent to 280 [kpt pkg init](https://kpt.dev/reference/cli/pkg/init/)) 281 * clone of an upstream package (equivalent to 282 [kpt pkg get](https://kpt.dev/reference/cli/pkg/get/)) from either a 283 registered upstream repository or from another accessible, unregistered, 284 repository 285 * edit an existing package (similar to the CLI command(s) 286 [kpt fn source](https://kpt.dev/reference/cli/fn/source/) or 287 [kpt pkg pull](https://github.com/GoogleContainerTools/kpt/issues/2557)) 288 * roll back / restore a package to any of its previous versions 289 ([kpt pkg pull](https://github.com/GoogleContainerTools/kpt/issues/2557) 290 of a previous version) 291 * Apply changes to a package _draft_. In general, mutations include 292 adding/modifying/deleting any part of the package's contents. Some specific 293 examples include: 294 * add/change/delete package metadata (i.e. some properties in the `Kptfile`) 295 * add/change/delete resources in the package 296 * add function mutators/validators to the package's [pipeline][] 297 * invoke a function imperatively on the package draft to perform a desired 298 mutation 299 * add/change/delete sub-package 300 * retrieve the contents of the package for arbitrary client-side mutations 301 (equivalent to [kpt fn source](https://kpt.dev/reference/cli/fn/source/)) 302 * update/replace the package contents with new contents, for example results 303 of a client-side mutations by a UI (equivalent to 304 [kpt fn sink](https://kpt.dev/reference/cli/fn/sink/)) 305 * Rebase a package onto another upstream base package 306 ([detail](https://github.com/GoogleContainerTools/kpt/issues/2548)) or onto 307 a newer version of the same package (to aid with conflict resolution during 308 the process of publishing a draft package) 309 * Get feedback during package authoring, and assistance in recovery from: 310 * merge conflicts, invalid package changes, guardrail violations 311 * compliance of the drafted package with repository-wide invariants and 312 guardrails 313 * Propose for a _draft_ package be _published_. 314 * Apply an arbitrary decision criteria, and by a manual or automated action, 315 approve (or reject) proposal of a _draft_ package to be _published_. 316 * Perform bulk operations such as: 317 * Assisted/automated update (upgrade, rollback) of groups of packages matching 318 specific criteria (i.e. base package has new version or specific base 319 package version has a vulnerability and should be rolled back) 320 * Proposed change validation (pre-validating change that adds a validator 321 function to a base package or a repository) 322 * Delete an existing package. 323 324 #### Authoring & Latency 325 326 An important goal of the Package Orchestration service is to support building 327 of task-specific UIs. In order to deliver low latency user experience acceptable 328 to UI interactions, the innermost authoring loop (depicted below) will require: 329 330 * high performance access to the package store (load/save package) w/ caching 331 * low latency execution of mutations and transformations on the package contents 332 * low latency [KRM function][krm functions] evaluation and package rendering 333 (evaluation of package's function pipelines) 334 335  336 337 #### Authoring & Access Control 338 339 A client can assign actors (persons, service accounts) to roles that determine 340 which operations they are allowed to perform in order to satisfy requirements 341 of the basic roles. For example, only permitted roles can: 342 343 * manipulate repository registration, enforcement of repository-wide 344 invariants and guardrails 345 * create a draft of a package and propose the draft be published 346 * approve (or reject) the proposal to publish a draft package 347 * clone a package from a specific upstream repository 348 * perform bulk operations such as rollout upgrade of downstream packages, 349 including rollouts across multiple downstream repositories 350 * etc. 351 352 ### Porch Architecture 353 354 The Package Orchestration service, **Porch** is designed to be hosted in a 355 [Kubernetes](https://kubernetes.io/) cluster. 356 357 The overall architecture is shown below, and includes also existing components 358 (k8s apiserver and Config Sync). 359 360  361 362 In addition to satisfying requirements highlighted above, the focus of the 363 architecture was to: 364 365 * establish clear components and interfaces 366 * support a low-latency package authoring experience required by the UIs 367 368 The Porch components are: 369 370 #### Porch Server 371 372 The Porch server is implemented as [Kubernetes extension API server][apiserver]. 373 The benefits of using Kubernetes extension API server are: 374 375 * well-defined and familiar API style 376 * availability of generated clients 377 * integration with existing Kubernetes ecosystem and tools such as `kubectl` 378 CLI, [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) 379 * avoids requirement to open another network port to access a separate endpoint 380 running inside k8s cluster (this is a distinct advantage over gRPC which we 381 considered as an alternative approach) 382 383 Resources implemented by Porch include: 384 385 * `PackageRevision` - represents the _metadata_ of the configuration package 386 revision stored in a _package_ repository. 387 * `PackageRevisionResources` - represents the _contents_ of the package revision 388 * `Function` - represents a [KRM function][krm functions] discovered in 389 a registered _function_ repository. 390 391 Note that each configuration package revision is represented by a _pair_ of 392 resources which each present a different view (or [representation][] of the same 393 underlying package revision. 394 395 Repository registration is supported by a `Repository` [custom resource][crds]. 396 397 **Porch server** itself comprises several key components, including: 398 399 * The *Porch aggregated apiserver* which implements the integration into the 400 main Kubernetes apiserver, and directly serves API requests for the 401 `PackageRevision`, `PackageRevisionResources` and `Function` resources. 402 * Package orchestration *engine* which implements the package lifecycle 403 operations, and package mutation workflows 404 * *CaD Library* which implements specific package manipulation algorithms such 405 as package rendering (evaluation of package's function *pipeline*), 406 initialization of a new package, etc. The CaD Library is shared with `kpt` 407 where it likewise provides the core package manipulation algorithms. 408 * *Package cache* which enables both local caching, as well as abstract 409 manipulation of packages and their contents irrespectively of the underlying 410 storage mechanism (Git, or OCI) 411 * *Repository adapters* for Git and OCI which implement the specific logic of 412 interacting with those types of package repositories. 413 * *Function runtime* which implements support for evaluating 414 [kpt functions][functions] and multi-tier cache of functions to support 415 low latency function evaluation 416 417 #### Function Runner 418 419 **Function runner** is a separate service responsible for evaluating 420 [kpt functions][functions]. Function runner exposes a [gRPC](https://grpc.io/) 421 endpoint which enables evaluating a kpt function on the provided configuration 422 package. 423 424 The gRPC technology was chosen for the function runner service because the 425 [requirements](#grpc-api) that informed choice of KRM API for the Package 426 Orchestration service do not apply. The function runner is an internal 427 microservice, an implementation detail not exposed to external callers. This 428 makes gRPC perfectly suitable. 429 430 The function runner also maintains cache of functions to support low latency 431 function evaluation. 432 433 #### CaD Library 434 435 The [kpt](https://kpt.dev/) CLI already implements foundational package 436 manipulation algorithms in order to provide the command line user experience, 437 including: 438 439 * [kpt pkg init](https://kpt.dev/reference/cli/pkg/init/) - create an empty, 440 valid, KRM package 441 * [kpt pkg get](https://kpt.dev/reference/cli/pkg/get/) - create a downstream 442 package by cloning an upstream package; set up the upstream reference of the 443 downstream package 444 * [kpt pkg update](https://kpt.dev/reference/cli/pkg/update/) - update the 445 downstream package with changes from new version of upstream, 3-way merge 446 * [kpt fn eval](https://kpt.dev/reference/cli/fn/eval/) - evaluate a kpt 447 function on a package 448 * [kpt fn render](https://kpt.dev/reference/cli/fn/render/) - render the package 449 by executing the function pipeline of the package and its nested packages 450 * [kpt fn source](https://kpt.dev/reference/cli/fn/source/) and 451 [kpt fn sink](https://kpt.dev/reference/cli/fn/sink/) - read package from 452 local disk as a `ResourceList` and write package represented as 453 `ResourcesList` into local disk 454 455 The same set of primitives form the foundational building blocks of the package 456 orchestration service. Further, the package orchestration service combines these 457 primitives into higher-level operations (for example, package orchestrator 458 renders packages automatically on changes, future versions will support bulk 459 operations such as upgrade of multiple packages, etc). 460 461 The implementation of the package manipulation primitives in kpt was refactored 462 (with initial refactoring completed, and more to be performed as needed) in 463 order to: 464 465 * create a reusable CaD library, usable by both kpt CLI and Package 466 Orchestration service 467 * create abstractions for dependencies which differ between CLI and Porch, 468 most notable are dependency on Docker for function evaluation, and dependency 469 on the local file system for package rendering. 470 471 Over time, the CaD Library will provide the package manipulation primitives: 472 473 * create a valid empty package (init) 474 * update package upstream pointers (get) 475 * perform 3-way merge (update) 476 * render - core package rendering algorithm using a pluggable function evaluator 477 to support: 478 * function evaluation via Docker (used by kpt CLI) 479 * function evaluation via an RPC to a service or appropriate function sandbox 480 * high-performance evaluation of trusted, built-in, functions without sandbox 481 * heal configuration (restore comments after lossy transformation) 482 483 and both kpt CLI and Porch will consume the library. This approach will allow 484 leveraging the investment already made into the high quality package 485 manipulation primitives, and enable functional parity between KPT CLI and 486 Package Orchestration service. 487 488 ## User Guide 489 490 Find the Porch User Guide in a dedicated [document](../../site/guides/porch-user-guide.md). 491 492 ## Open Issues/Questions 493 494 ### Deployment Rollouts & Orchestration 495 496 __Not Yet Resolved__ 497 498 Cross-cluster rollouts and orchestration of deployment activity. For example, 499 package deployed by Config Sync in cluster A, and only on success, the same 500 (or a different) package deployed by Config Sync in cluster B. 501 502 ## Alternatives Considered 503 504 ### gRPC API 505 506 We considered the use of [gRPC]() for the Porch API. The primary advantages of 507 implementing Porch as an extension Kubernetes apiserver are: 508 * customers won't have to open another port to their Kubernetes cluster and can 509 reuse their existing infrastructure 510 * customers can likewise reuse existing, familiar, Kubernetes tooling ecosystem