github.com/wmuizelaar/kpt@v0.0.0-20221018115725-bd564717b2ed/docs/design-docs/06-config-as-data.md

github.com/wmuizelaar/kpt@v0.0.0-20221018115725-bd564717b2ed/docs/design-docs/06-config-as-data.md (about)

     1  # Configuration as Data
     2  
     3  * Author(s): Martin Maly, @martinmaly
     4  * Approver: @bgrant0607
     5  
     6  ## Why
     7  
     8  This document provides background context for Package Orchestration, which is
     9  further elaborated in a dedicated [document](07-package-orchestration.md).
    10  
    11  ## Configuration as Data
    12  
    13  *Configuration as Data* is an approach to management of configuration (incl.
    14  configuration of infrastructure, policy, services, applications, etc.) which:
    15  
    16  * makes configuration data the source of truth, stored separately from the live
    17    state
    18  * uses a uniform, serializable data model to represent configuration
    19  * separates code that acts on the configuration from the data and from packages
    20    / bundles of the data
    21  * abstracts configuration file structure and storage from operations that act
    22    upon the configuration data; clients manipulating configuration data don’t
    23    need to directly interact with storage (git, container images)
    24  
    25  ![CaD Overview](./CaD%20Overview.svg)
    26  
    27  ## Key Principles
    28  
    29  A system based on CaD *should* observe the following key principles:
    30  
    31  * secrets should be stored separately, in a secret-focused storage system
    32    ([example](https://cloud.google.com/secret-manager))
    33  * stores a versioned history of configuration changes by change sets to bundles
    34    of related configuration data
    35  * relies on uniformity and consistency of the configuration format, including
    36    type metadata, to enable pattern-based operations on the configuration data,
    37    along the lines of duck typing
    38  * separates schemas for the configuration data from the data, and relies on
    39    schema information for strongly typed operations and to disambiguate data
    40    structures and other variations within the model
    41  * decouples abstractions of configuration from collections of configuration data
    42  * represents abstractions of configuration generators as data with schemas, like
    43    other configuration data
    44  * finds, filters / queries / selects, and/or validates configuration data that
    45    can be operated on by given code (functions)
    46  * finds and/or filters / queries / selects code (functions) that can operate on
    47    resource types contained within a body of configuration data
    48  * *actuation* (reconciliation of configuration data with live state) is separate
    49    from transformation of configuration data, and is driven by the declarative
    50    data model
    51  * transformations, particularly value propagation, are preferable to wholesale
    52    configuration generation except when the expansion is dramatic (say, >10x)
    53  * transformation input generation should usually be decoupled from propagation
    54  * deployment context inputs should be taken from well defined “provider context”
    55    objects
    56  * identifiers and references should be declarative
    57  * live state should be linked back to sources of truth (configuration)
    58  
    59  ## KRM CaD
    60  
    61  Our implementation of the Configuration as Data approach (
    62  [kpt](https://kpt.dev),
    63  [Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview),
    64  and [Package Orchestration](https://github.com/GoogleContainerTools/kpt/tree/main/porch))
    65  build on the foundation of
    66  [Kubernetes Resource Model](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md)
    67  (KRM).
    68  
    69  **Note**: Even though KRM is not a requirement of Config as Data (just like
    70  Python or Go templates or Jinja are not specifically requirements for
    71  [IaC](https://en.wikipedia.org/wiki/Infrastructure_as_code)), the choice of
    72  another foundational config representation format would necessitate
    73  implementing adapters for all types of infrastructure and applications
    74  configured, including Kubernetes, CRDs, GCP resources and more. Likewise, choice
    75  of another configuration format would require redesign of a number of the
    76  configuration management mechanisms that have already been designed for KRM,
    77  such as 3-way merge, structural merge patch, schema descriptions, resource
    78  metadata, references, status conventions, etc.
    79  
    80  **KRM CaD** is therefore a specific approach to implementing *Configuration as
    81  Data* which:
    82  * uses [KRM](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md)
    83    as the configuration serialization data model
    84  * uses [Kptfile](https://kpt.dev/reference/schema/kptfile/) to store package
    85    metadata
    86  * uses [ResourceList](https://kpt.dev/reference/schema/resource-list/) as a
    87    serialized package wire-format
    88  * uses a function `ResourceList → ResultList` (`kpt` function) as the
    89    foundational, composable unit of package-manipulation code (note that other
    90    forms of code can manipulate packages as well, i.e. UIs, custom algorithms
    91    not necessarily packaged and used as kpt functions)
    92  
    93  and provides the following basic functionality:
    94  
    95  * load a serialized package from a repository (as `ResourceList`) (examples of
    96    repository may be one or more of: local HDD, Git repository, OCI, Cloud
    97    Storage, etc.)
    98  * save a serialized package (as `ResourceList`) to a package repository
    99  * evaluate a function on a serialized package (`ResourceList`)
   100  * [render](https://kpt.dev/book/04-using-functions/01-declarative-function-execution)
   101    a package (evaluate functions declared within the package itself)
   102  * create a new (empty) package
   103  * fork (or clone) an existing package from one package repository (called
   104    upstream) to another (called downstream)
   105  * delete a package from a repository
   106  * associate a version with the package; guarantee immutability of packages with
   107    an assigned version
   108  * incorporate changes from the new version of an upstream package into a new
   109    version of a downstream package
   110  * revert to a prior version of a package
   111  
   112  ## Value
   113  
   114  The Config as Data approach enables some key value which is available in other
   115  configuration management approaches to a lesser extent or is not available
   116  at all.
   117  
   118  *CaD* approach enables:
   119  
   120  * simplified authoring of configuration using a variety of methods and sources
   121  * WYSIWYG interaction with configuration using a simple data serialization
   122    formation rather than a code-like format
   123  * layering of interoperable interface surfaces (notably GUI) over declarative
   124    configuration mechanisms rather than forcing choices between exclusive
   125    alternatives (exclusively UI/CLI or IaC initially followed by exclusively
   126    UI/CLI or exclusively IaC)
   127  * the ability to apply UX techniques to simplify configuration authoring and
   128    viewing
   129  * compared to imperative tools (e.g., UI, CLI) that directly modify the live
   130    state via APIs, CaD enables versioning, undo, audits of configuration history,
   131    review/approval, pre-deployment preview, validation, safety checks,
   132    constraint-based policy enforcement, and disaster recovery
   133  * bulk changes to configuration data in their sources of truth
   134  * injection of configuration to address horizontal concerns
   135  * merging of multiple sources of truth
   136  * state export to reusable blueprints without manual templatization
   137  * cooperative editing of configuration by humans and automation, such as for
   138    security remediation (which is usually implemented against live-state APIs)
   139  * reusability of configuration transformation code across multiple bodies of
   140    configuration data containing the same resource types, amortizing the effort
   141    of writing, testing, documenting the code
   142  * combination of independent configuration transformations
   143  * implementation of config transformations using the languages of choice,
   144    including both programming and scripting approaches
   145  * reducing the frequency of changes to existing transformation code
   146  * separation of roles between developer and non-developer configuration users
   147  * defragmenting the configuration transformation ecosystem
   148  * admission control and invariant enforcement on sources of truth
   149  * maintaining variants of configuration blueprints without one-size-fits-all
   150    full struct-constructor-style parameterization and without manually
   151    constructing and maintaining patches
   152  * drift detection and remediation for most of the desired state via continuous
   153    reconciliation using apply and/or for specific attributes via targeted
   154    mutation of the sources of truth
   155  
   156  ## Related Articles
   157  
   158  For more information about Configuration as Data and Kubernetes Resource Model,
   159  visit the following links:
   160  
   161  * [Rationale for kpt](https://kpt.dev/guides/rationale)
   162  * [Understanding Configuration as Data](https://cloud.google.com/blog/products/containers-kubernetes/understanding-configuration-as-data-in-kubernetes)
   163    blog post.
   164  * [Kubernetes Resource Model](https://cloud.google.com/blog/topics/developers-practitioners/build-platform-krm-part-1-whats-platform)
   165    blog post series