github.com/wmuizelaar/kpt@v0.0.0-20221018115725-bd564717b2ed/docs/design-docs/06-config-as-data.md (about) 1 # Configuration as Data 2 3 * Author(s): Martin Maly, @martinmaly 4 * Approver: @bgrant0607 5 6 ## Why 7 8 This document provides background context for Package Orchestration, which is 9 further elaborated in a dedicated [document](07-package-orchestration.md). 10 11 ## Configuration as Data 12 13 *Configuration as Data* is an approach to management of configuration (incl. 14 configuration of infrastructure, policy, services, applications, etc.) which: 15 16 * makes configuration data the source of truth, stored separately from the live 17 state 18 * uses a uniform, serializable data model to represent configuration 19 * separates code that acts on the configuration from the data and from packages 20 / bundles of the data 21 * abstracts configuration file structure and storage from operations that act 22 upon the configuration data; clients manipulating configuration data don’t 23 need to directly interact with storage (git, container images) 24 25 data:image/s3,"s3://crabby-images/10866/10866d896a6b9d09a96bbd6135b38417b5ca1b3c" alt="CaD Overview" 26 27 ## Key Principles 28 29 A system based on CaD *should* observe the following key principles: 30 31 * secrets should be stored separately, in a secret-focused storage system 32 ([example](https://cloud.google.com/secret-manager)) 33 * stores a versioned history of configuration changes by change sets to bundles 34 of related configuration data 35 * relies on uniformity and consistency of the configuration format, including 36 type metadata, to enable pattern-based operations on the configuration data, 37 along the lines of duck typing 38 * separates schemas for the configuration data from the data, and relies on 39 schema information for strongly typed operations and to disambiguate data 40 structures and other variations within the model 41 * decouples abstractions of configuration from collections of configuration data 42 * represents abstractions of configuration generators as data with schemas, like 43 other configuration data 44 * finds, filters / queries / selects, and/or validates configuration data that 45 can be operated on by given code (functions) 46 * finds and/or filters / queries / selects code (functions) that can operate on 47 resource types contained within a body of configuration data 48 * *actuation* (reconciliation of configuration data with live state) is separate 49 from transformation of configuration data, and is driven by the declarative 50 data model 51 * transformations, particularly value propagation, are preferable to wholesale 52 configuration generation except when the expansion is dramatic (say, >10x) 53 * transformation input generation should usually be decoupled from propagation 54 * deployment context inputs should be taken from well defined “provider context” 55 objects 56 * identifiers and references should be declarative 57 * live state should be linked back to sources of truth (configuration) 58 59 ## KRM CaD 60 61 Our implementation of the Configuration as Data approach ( 62 [kpt](https://kpt.dev), 63 [Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview), 64 and [Package Orchestration](https://github.com/GoogleContainerTools/kpt/tree/main/porch)) 65 build on the foundation of 66 [Kubernetes Resource Model](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md) 67 (KRM). 68 69 **Note**: Even though KRM is not a requirement of Config as Data (just like 70 Python or Go templates or Jinja are not specifically requirements for 71 [IaC](https://en.wikipedia.org/wiki/Infrastructure_as_code)), the choice of 72 another foundational config representation format would necessitate 73 implementing adapters for all types of infrastructure and applications 74 configured, including Kubernetes, CRDs, GCP resources and more. Likewise, choice 75 of another configuration format would require redesign of a number of the 76 configuration management mechanisms that have already been designed for KRM, 77 such as 3-way merge, structural merge patch, schema descriptions, resource 78 metadata, references, status conventions, etc. 79 80 **KRM CaD** is therefore a specific approach to implementing *Configuration as 81 Data* which: 82 * uses [KRM](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md) 83 as the configuration serialization data model 84 * uses [Kptfile](https://kpt.dev/reference/schema/kptfile/) to store package 85 metadata 86 * uses [ResourceList](https://kpt.dev/reference/schema/resource-list/) as a 87 serialized package wire-format 88 * uses a function `ResourceList → ResultList` (`kpt` function) as the 89 foundational, composable unit of package-manipulation code (note that other 90 forms of code can manipulate packages as well, i.e. UIs, custom algorithms 91 not necessarily packaged and used as kpt functions) 92 93 and provides the following basic functionality: 94 95 * load a serialized package from a repository (as `ResourceList`) (examples of 96 repository may be one or more of: local HDD, Git repository, OCI, Cloud 97 Storage, etc.) 98 * save a serialized package (as `ResourceList`) to a package repository 99 * evaluate a function on a serialized package (`ResourceList`) 100 * [render](https://kpt.dev/book/04-using-functions/01-declarative-function-execution) 101 a package (evaluate functions declared within the package itself) 102 * create a new (empty) package 103 * fork (or clone) an existing package from one package repository (called 104 upstream) to another (called downstream) 105 * delete a package from a repository 106 * associate a version with the package; guarantee immutability of packages with 107 an assigned version 108 * incorporate changes from the new version of an upstream package into a new 109 version of a downstream package 110 * revert to a prior version of a package 111 112 ## Value 113 114 The Config as Data approach enables some key value which is available in other 115 configuration management approaches to a lesser extent or is not available 116 at all. 117 118 *CaD* approach enables: 119 120 * simplified authoring of configuration using a variety of methods and sources 121 * WYSIWYG interaction with configuration using a simple data serialization 122 formation rather than a code-like format 123 * layering of interoperable interface surfaces (notably GUI) over declarative 124 configuration mechanisms rather than forcing choices between exclusive 125 alternatives (exclusively UI/CLI or IaC initially followed by exclusively 126 UI/CLI or exclusively IaC) 127 * the ability to apply UX techniques to simplify configuration authoring and 128 viewing 129 * compared to imperative tools (e.g., UI, CLI) that directly modify the live 130 state via APIs, CaD enables versioning, undo, audits of configuration history, 131 review/approval, pre-deployment preview, validation, safety checks, 132 constraint-based policy enforcement, and disaster recovery 133 * bulk changes to configuration data in their sources of truth 134 * injection of configuration to address horizontal concerns 135 * merging of multiple sources of truth 136 * state export to reusable blueprints without manual templatization 137 * cooperative editing of configuration by humans and automation, such as for 138 security remediation (which is usually implemented against live-state APIs) 139 * reusability of configuration transformation code across multiple bodies of 140 configuration data containing the same resource types, amortizing the effort 141 of writing, testing, documenting the code 142 * combination of independent configuration transformations 143 * implementation of config transformations using the languages of choice, 144 including both programming and scripting approaches 145 * reducing the frequency of changes to existing transformation code 146 * separation of roles between developer and non-developer configuration users 147 * defragmenting the configuration transformation ecosystem 148 * admission control and invariant enforcement on sources of truth 149 * maintaining variants of configuration blueprints without one-size-fits-all 150 full struct-constructor-style parameterization and without manually 151 constructing and maintaining patches 152 * drift detection and remediation for most of the desired state via continuous 153 reconciliation using apply and/or for specific attributes via targeted 154 mutation of the sources of truth 155 156 ## Related Articles 157 158 For more information about Configuration as Data and Kubernetes Resource Model, 159 visit the following links: 160 161 * [Rationale for kpt](https://kpt.dev/guides/rationale) 162 * [Understanding Configuration as Data](https://cloud.google.com/blog/products/containers-kubernetes/understanding-configuration-as-data-in-kubernetes) 163 blog post. 164 * [Kubernetes Resource Model](https://cloud.google.com/blog/topics/developers-practitioners/build-platform-krm-part-1-whats-platform) 165 blog post series