github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171220_encryption_at_rest.md (about) 1 - Feature Name: Encryption at rest 2 - Status: in-progress 3 - Start Date: 2017-11-01 4 - Authors: Marc Berhault 5 - RFC PR: [#19785](https://github.com/cockroachdb/cockroach/pull/19785) 6 - Cockroach Issue: [#19783](https://github.com/cockroachdb/cockroach/issues/19783) 7 8 9 Table of Contents 10 ================= 11 12 * [Summary](#summary) 13 * [Motivation](#motivation) 14 * [Related resources](#related-resources) 15 * [Out of scope](#out-of-scope) 16 * [Security analysis](#security-analysis) 17 * [Attack profiles](#attack-profiles) 18 * [Assumptions](#assumptions) 19 * [Considerations](#considerations) 20 * [Guide-level explanation](#guide-level-explanation) 21 * [Terminology](#terminology) 22 * [User-level explanation](#user-level-explanation) 23 * [Configuration recommendations](#configuration-recommendations) 24 * [Store keys](#store-keys) 25 * [Data keys](#data-keys) 26 * [User control of encryption](#user-control-of-encryption) 27 * [Contributor impact](#contributor-impact) 28 * [Reference-level explanation](#reference-level-explanation) 29 * [Detailed design](#detailed-design) 30 * [Store version](#store-version) 31 * [Switching Env](#switching-env) 32 * [COCKROACHDB_REGISTRY](#cockroachdb_registry) 33 * [Encrypted Env](#encrypted-env) 34 * [Key levels](#key-levels) 35 * [Key status](#key-status) 36 * [Store keys files](#store-keys-files) 37 * [Key Manager](#key-manager) 38 * [Rotating store keys](#rotating-store-keys) 39 * [Data keys file format](#data-keys-file-format) 40 * [Generating data keys](#generating-data-keys) 41 * [Rotating data keys](#rotating-data-keys) 42 * [Reporting encryption status](#reporting-encryption-status) 43 * [Other uses of local disk](#other-uses-of-local-disk) 44 * [Enterprise enforcement](#enterprise-enforcement) 45 * [Drawbacks](#drawbacks) 46 * [Directs us towards rocksdb-level encryption](#directs-us-towards-rocksdb-level-encryption) 47 * [Lack of correctness testing of rocksdb encryption layer](#lack-of-correctness-testing-of-rocksdb-encryption-layer) 48 * [Complexity of configuration and monitoring](#complexity-of-configuration-and-monitoring) 49 * [No strong license enforcement](#no-strong-license-enforcement) 50 * [Non-live rocksdb files will rot](#non-live-rocksdb-files-will-rot) 51 * [CCL code location](#ccl-code-location) 52 * [Rationale and Alternatives](#rationale-and-alternatives) 53 * [Filesystem encryption](#filesystem-encryption) 54 * [Fine-grained encryption](#fine-grained-encryption) 55 * [Single level of keys](#single-level-of-keys) 56 * [Relationship between store and data keys](#relationship-between-store-and-data-keys) 57 * [Directly using the data prefix format](#directly-using-the-data-prefix-format) 58 * [Future improvements](#future-improvements) 59 * [v1.0: a.k.a. MVP](#v10-aka-mvp) 60 * [Possible future additions](#possible-future-additions) 61 62 # Summary 63 64 This feature is Enterprise. 65 66 We propose to add support for encryption at rest on cockroach nodes, with 67 encryption being done at the rocksdb layer for each file. 68 69 We provide CTR-mode AES encryption for all files written through rocksdb. 70 71 Keys are split into user-provided store keys and dynamically-generated data keys. 72 Store keys are used to encrypt the data keys. Data keys are used to encrypt the actual data. 73 Store keys can be rotated at the user's discretion. Data keys can be rotated automatically 74 on a regular schedule, relying on rocksdb churn to re-encrypt data. 75 76 Plaintext files go through the regular rocksdb interface to the filesystem. Encrypted files 77 go through an intermediate layer responsible for all encryption tasks. 78 79 Data can be transitioned from plaintext to encrypted and back with status being reported 80 continuously. 81 82 # Motivation 83 84 Encryption is desired for security reasons (prevent access from other users on the same 85 machine, prevent data leak through drive theft/disposal) as well as regulatory reasons 86 (GDPR, HIPAA, PCI DSS). 87 88 Encryption at rest is necessary when other methods of encryption are either not desirable, 89 or not sufficient (eg: filesystem-level encryption cannot be used if DBAs do not have 90 access to filesystem encryption utilities). 91 92 # Related resources 93 94 * [Crypto++](https://www.cryptopp.com/) 95 * [overview of block cipher modes](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Common_modes) 96 * [rocksdb PR adding env_encryption](https://github.com/facebook/rocksdb/pull/2424) 97 * [SEI Cert C coding standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard) 98 99 # Out of scope 100 101 The following are not in scope but should not be hindered by implementation of this RFC: 102 * encryption of non-rocksdb data (eg: log files) 103 * integration with external key storage systems such as Vault, AWS KMS, KeyWhiz 104 * auditing of key usage and encryption status 105 * integration with HSM (hardware security module) or TPM (Trusted Platform Module) 106 * FIPS-140-2 compliance 107 See [Possible future additions](#possible-future-additions) for more currently-out-of-scope features. 108 109 The following are unrelated to encryption-at-rest as currently proposed: 110 * encrypted backup (should be supported regardless of encryption-at-rest status) 111 * fine-granularity encryption (that cannot use zone configs to select encrypted replicas) 112 * restricting data processing on encrypted nodes (requires planning/gateway coordination) 113 114 # Security analysis 115 116 Caveat: this is not a thorough security analysis of the proposed solution, let alone its implementation. 117 118 This section should be expanded and studied carefully before this RFC is approved. 119 120 ## Attack profiles 121 122 The goal of this feature is to block two attack vectors: 123 124 ### Access to raw disk offline 125 126 An attacker can gain access to the disk after it has been removed from the system (eg: node decommission). 127 At-rest encryption should make all data on the disk useless if the following are true: 128 * none of the store keys are available or previously compromised 129 * none of the data went through a phase where either store or data encryption was `plaintext` 130 131 ### Access to a running system by unprivileged user 132 133 Unprivileged users (eg: non root) should not be able to extract cockroach data even if they have access to the 134 raw rocksdb files. 135 This will still not guard against: 136 * privileged users (with access to store keys or memory) 137 * data that was at some point stored as `plaintext` 138 139 ## Assumptions 140 141 Some of the assumptions here can be verified by runtime checks, but others must be satisfied by the user (see 142 [Configuration Recommendation](#configuration-recommendations). 143 144 ### No privileged access 145 146 We assume attackers do not have privileged access on a running system. Specifically: 147 * store keys cannot be read 148 * cockroach memory cannot be directly accessed 149 * command line flags cannot be modified 150 151 ### No write access by attackers 152 153 A big assumption in this document is that attackers do not have write access to the raw files while 154 we are operating: we trust the integrity of the store and data key files as well as all data written on disk. 155 156 This includes the case of an attacker removing a disk, modifying it, and re-inserting it into the cluster. 157 158 A potential future improvement is to use authenticated encryption to verify the integrity of files on disk. 159 This would add complexity and cost to filesystem-level operations in rocksdb as we would need to read entire 160 files to compute authentication tags. 161 162 However, integrity checking can be cheaply used on the data keys file. 163 164 ## Considerations 165 166 ### Random number generator 167 168 We need to generate random values for a few things: 169 * data keys 170 * nonce/counter for each file 171 172 Crypto++ provides [OS_GenerateRandomBlock](https://www.cryptopp.com/wiki/RandomNumberGenerator#OS_Entropy) 173 which can operate in blocking (using `/dev/random`) or non-blocking (using `/dev/urandom`) mode. 174 We would prefer to use better entropy for data keys, but `/dev/random` is notoriously slow especially 175 when just starting rocksdb with very little disk/network utilization. 176 177 Generating data keys (other than the first one, or when changing encryption ciphers) can be done 178 in the background so we may be able to use the higher entropy `/dev/random`. 179 nonces may be safe to keep generating using the lower-entropy `/dev/urandom`. 180 181 More research must be done into the use of `/dev/random` in multi-user environment. For example, is it possible 182 for an attacked to consume `/dev/random` for long enough that key generation is effectively disabled? 183 184 ### IV makeup and reuse prevention 185 186 An important consideration in AES-CTR is making sure we never reuse the same IV for a given key. 187 188 The IV has a size of `AES::BlockSize`, or 128 bits. It is made of two parts: 189 * nonce: 96 bits, randomly generated for each file 190 * counter: 32 bits, incremented for each block in the file 191 192 This imposes two limits: 193 * maximum file size: `2^32 128-bit blocks == 64GiB` 194 * probability of nonce re-use after `2^32` files is `2^-32` 195 196 These limits should be sufficient for our needs. 197 198 ### Safety of symmetric key hashes 199 200 Given a reasonably safe hashing algorithm, exposing the hash of the store keys should not be an issue. 201 202 Indeed, finding collisions in `sha256` is not currently easier than cracking `aes128`. Should better collision 203 methods be found, this is still not the key itself. 204 205 ### Memory safety 206 207 We need to provide safety for the keys while held in memory. 208 At the C++ level, we can control two aspects: 209 * don't swap to disk: using `mlock` (`man mlock(2)`) on memory holding keys, preventing paging out to disk 210 * don't core dump: using `madvise` with `MADV_DONTDUMP` (see `man madvise(2)` on Linux) to exclude pages from core dumps. 211 212 There is no equivalent in Go so the current approach is to avoid loading keys in Go. 213 This can become problematic if we want to reuse the keys to encrypt log files written in Go. 214 No good answer presents itself. 215 216 # Guide-level explanation 217 218 ## Terminology 219 220 Terminology used in this RFC: 221 * **data key**: a.k.a Data-encryption-key. Used to encrypt the actual on-disk data. These are generated automatically. 222 * **store key**: a.k.a. Key-encryption-key. Used to encrypt the set of data keys. Provided by the user. 223 * **active key**: the key being used to encrypt new data. 224 * **key rotation**: encrypting data with a new key. Rotation starts when the new key is provided and ends when no data encrypted with the old key remains. 225 * **plaintext**: unencrypted data. 226 * **Env**: rocksdb terminology for the layer between rocksdb and the filesystem. 227 * **Switching Env**: our new Env that can switch between plaintext and encrypted envs. 228 229 ## User-level explanation 230 231 Encryption-at-rest is an optional feature that can be enabled on a per-store basis. 232 233 In order to enable encryption on a given store, the user needs two things: 234 * an enterprise license 235 * one or more store key(s) 236 237 Enabling encryption increases the store version, making downgrade to a binary before encryption impossible. 238 239 ### Configuration recommendations 240 241 We identify a few configuration requirements for users to safely use encryption at rest. 242 243 **TODO**: this will need to be fleshed out when writing the docs. 244 245 * restricted access to store keys (ideally, only the cockroach user, and read-only access) 246 * store keys and cockroach data must not be on the same filesystem/disk (including temporary working directories) 247 * restricted access to all cockroach data 248 * disable swap 249 * don't enable core dumps 250 * reasonable key generation/rotation 251 * monitoring 252 * ideally, the store keys are not stored on the machine (use something like `keywhiz`) 253 254 ### Store keys 255 256 The store key is a symmetric key provided by the user. It has the following properties: 257 * unique for each store 258 * available only to the cockroach process on the node 259 * not stored on the same disk as the cockroach data 260 261 Store keys are stored in raw format in files (one file per key). 262 eg: to generate a 128-bit key: `openssl rand 16 > store.key` 263 264 Specifying store keys is done through the `--enterprise-encryption` flag. There are two key fields in this flag: 265 * `key`: path to the active store key, or `plain` for plaintext (default). 266 * `old_key`: path to the previous store key, or `plain` for plaintext (default). 267 268 When a new `key` is specified, we must tell cockroach what the previous active key was through `old_key`. 269 270 ### Data keys 271 272 Data keys are automatically generated by cockroach. They are stored in the data directory and 273 encrypted with the active store key. Data keys are used to encrypt the actual files inside the data 274 directory. 275 276 This two-level approach allows easy rotation of store keys and provides safer encryption of large amounts of 277 data. To rotate the store key, all we need to do is re-encrypt the file containing the data keys, leaving 278 the bulk of the data as is. 279 280 Data keys are generated and rotated by cockroach. 281 There are two parameters controlling how data keys behave: 282 * encryption cipher: the cipher in use for data encryption. The cipher is currently `AES CTR` with the same key 283 size as the store key. 284 * rotation period: the time before a new key is generated and used. Default value: 1 week. This can be set through a flag. 285 286 ### User control of encryption 287 288 #### Recommended production configuration 289 290 The need for encryption entails a few recommended changes in production configuration: 291 * disable swap/core dumps: we want to avoid any data hitting disk unencrypted, this includes memory being swapped out. 292 * run on architectures that support the [AES-NI instruction set](https://en.wikipedia.org/wiki/AES_instruction_set). 293 * have a separate area (encrypted or in-memory partition, fuse-filesystem, etc...) to store the store-level keys. 294 295 #### Flag changes for the cockroach binary 296 297 We add a new flag for CCL binaries. It must be specified for each store we wish encrypted: 298 ``` 299 --enterprise-encryption=path=<path to store>,key=<path to key file>,old_key=<path to old key>,rotation_period=<duration> 300 ``` 301 302 The individual fields are: 303 * `path`: the path to the data directory of the corresponding store. This must match the path specified in `--store` 304 * `key`: the path to the current encryption key, or `plaintext` if we wish to use plaintext. default: `plaintext` 305 * `old_key`: the path to the previous encryption key. Only needed if data was already encrypted. 306 * `rotation_period`: how often data keys should be rotated. default: `1 week` 307 308 The flag can be specified multiple times, once for each store. 309 310 The encryption flags can specify different encryption states for different stores (eg: one encrypted one plain, 311 different rotation periods). 312 313 #### Enabling encryption on a store 314 315 Turning on encryption for a new store or a store currently in plaintext involves the following: 316 317 ``` 318 # Ensure your key file exists and has valid key data (correct size) 319 # For example, to generate a key for AES-128: 320 $ openssl rand 16 > /path/to/cockroach.key 321 # Specify the enterprise-encryption: 322 $ cockroach start <regular options> \ 323 --store=/mnt/data \ 324 --enterprise-encryption=path=/mnt/data,key=/path/to/cockroach.key 325 ``` 326 327 The node will generate a 128 bit data key, encrypt the list of data keys with the store key, and use AES128 328 encryption for all new files. 329 330 Examine the logs or node debug pages to see that encryption is now enabled and see its progress. 331 332 #### Rotating the store key 333 334 Given the previous configuration, we can generate a new store key. We must pass the previous key. 335 336 ``` 337 # Create a new 128 bit key. 338 $ openssl rand 16 > /path/to/cockroach.new.key 339 # Tell cockroach about the new key, and pass the old key (/path/to/cockroach.key) 340 $ cockroach start <regular options> \ 341 --store=/mnt/data \ 342 --enterprise-encryption=path=/mnt/data,key=/path/to/cockroach.new.key,old_key=/path/to/cockroach.key 343 ``` 344 345 Examine the logs or node debug pages to see that the new key is now in use. 346 It is now safe to delete the old key file. 347 348 #### Disabling encryption 349 350 We can switch an encrypted store back plaintext. This is done by using the special value `plaintext` in the 351 `key` field of the encryption flag. We need to specify the previous encryption key. 352 353 ``` 354 # Instead of a key file, use "plaintext" as the argument. 355 # Pass the old key to allow decrypting existing data. 356 $ cockroach start <regular options> \ 357 --store=/mnt/data \ 358 --enterprise-encryption=path=/mnt/data,key=plain,old_key=/path/to/cockroach.new.keys 359 ``` 360 361 Examine the logs or node debug pages to see that the store encryption status is now plaintext. It is now safe to delete the old key file. 362 363 Examine logs and debug pages to see progress of data encryption. This may take some time. 364 365 ## Contributor impact 366 367 The biggest impact of this change on contributors is the fact that all data on a given store must be encrypted. 368 369 There are three main categories: 370 * using the store rocksdb instance: encryption is done automatically 371 * using a separate rocksdb instance: encryption settings **must** be given to the new instance. Care must be taken to ensure that users know not to place store keys on the same disks as the rocksdb directory 372 * using anything other than rocksdb: logs (written at the Go level) are marked out of scope for this document. However, any raw data written to disk should use the same encryption settings as the store 373 374 # Reference-level explanation 375 376 ## Detailed design 377 378 ### Store version 379 380 We introduce a new [store version](https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/engine/version.go#L27) to mark switching to stores supporting encryption. 381 382 Stores are currently using `versionBeta20160331`. If no encryption flags are specified, we remain at this 383 version until a "reasonable" time (one or two minor stable releases) has passed. 384 385 Specifying the `--enterprise-encryption` flag increases the version to `versionSwitchingEnv`. Downgrades to 386 binaries that do not support this version is not possible. 387 388 ### Switching Env 389 390 Rocksdb performs filesystem-level operations through an [`Env`](https://github.com/facebook/rocksdb/blob/master/include/rocksdb/env.h). 391 392 This layer can be used to provide different behavior for a number of reasons. For example: 393 * posix support: the default `Env` 394 * in-memory support: for testing or in-memory databases 395 * hdfs: for HDFS-backed rocksdb instances 396 * encryption: for file-level encryption with encryption settings stored in a 4KB data prefix 397 * wrapper: can override specific methods, the rest are passed through to a `base env` 398 399 We leverage the `Env` layer to implement the following behavior: 400 * stores at `versionBeta20160331` continue to use the default `Env` 401 * stores at `versionSwitchingEnv` use the switching env 402 * plaintext files under version `versionSwitchingEnv` use a default `Env` 403 * encrypted files under version `versionSwitchingEnv` use an `EncryptedEnv` 404 405 ``` 406 versionBeta20160331: DefaultEnv 407 408 versionSwitchingEnv: SwitchingEnv: Encrypted? no -----> DefaultEnv 409 yes -----> EncryptedEnv 410 ``` 411 412 The state of a file (plaintext or encrypted) is stored in a file registry. This records the list of all 413 encrypted files by filename and is persisted to disk in a file named `COCKROACHDB_REGISTRY`. 414 415 For every file being operated on, the switching env must lookup its existing encryption state in the registry or the 416 desired encryption state for new files. If the file is plaintext, pass the operation down to the `DefaultEnv`. 417 If the file is encrypted, pass the operation down to the `EncryptedEnv`. For a new file, we must successfully 418 persist its state in the registry before proceeding with the operation. 419 420 Most `SwitchingEnv` methods will perform something like the following: 421 ``` 422 OpOnFile(filename) 423 // Determine whether the file uses encryption (existing files) or encryption is desired (new files) 424 if !registry.HasFile(filename) 425 useEncryption = lookup desired encryption (from --enterprise-encryption flag) 426 add filename to registry 427 persist registry to disk. Error out on failure. 428 else 429 useEncryption = get file encryption state from registry 430 431 // Perform the operation through the appropriate Env. 432 if useEncryption 433 EncryptedEnv->OpOnFile(filename) 434 else 435 DefaultEnv->OpOnFile(filename) 436 ``` 437 438 The registry may accumulate non-existent entries if writes fail after addition or removal fails after deletes. 439 It will also gather entries that are never deleted by rocksdb (eg: archives). We can clean these up 440 by adding a periodic [garbage collection](#garbage-collection-of-registry-entries). 441 442 ### COCKROACHDB_REGISTRY 443 444 The registry is a new file containing encryption status information for files written through rocksdb. 445 This is similar to rocksdb's `MANIFEST`. We intentionally do not call it manifest to avoid confusion. 446 447 It is stored in the base rocksdb directory for the store and written using a `write/close/rename` method. 448 It is always operated on through the `DefaultEnv`. 449 450 Encrypted files are always present in the registry. Plaintext files are not registered as we cannot guarantee 451 their presence when operating on an existing store. 452 453 `Env` operations on files will use the registry in different ways: 454 * existing file: lookup its encryption state in the registry, assume plaintext if missing 455 * existing file if it exists, otherwise new file: lookup its encryption state in the registry. If missing, stat the file through the `DefaultEnv`. If it does not exist, see "create a new file" 456 * create a new file: lookup the desired encryption state. If encrypted, persist it in the registry 457 458 The registry is a serialized protocol buffer: 459 ``` 460 enum EncryptionRegistryVersion { 461 // The only version so far. 462 Base = 0; 463 } 464 465 message EncryptionRegistry { 466 // version is currently always Base. 467 int version = 1; 468 repeated EncryptedFile files = 2; 469 } 470 471 enum EncryptionType { 472 // No encryption applied, not used for the registry. 473 Plaintext = 0; 474 // AES in counter mode. 475 AES_CTR = 1; 476 } 477 478 message EncryptedFile { 479 Filename string = 1; 480 // The type of encryption applied. 481 EncryptionType type = 2; 482 483 // Encryption fields. This may move to a separate AES-CTR message. 484 // ID (hash) of the key in use, if any. 485 optional bytes key_id = 3; 486 // Initialization vector, of size 96 bits (12 bytes) for AES. 487 optional bytes nonce = 4; 488 // Counter, allowing 2^32 blocks per file, so 64GiB. 489 optional uint32 counter = 5; 490 } 491 ``` 492 493 The registry contains all information needed to find the encryption key used for a given file and encrypt/decrypt it. 494 495 ### Encrypted Env 496 497 Rocksdb has an `EncryptedEnv` introduced in [PR 2424](https://github.com/facebook/rocksdb/pull/2424). 498 It adds a 4KiB data block at the beginning of each file with a nonce and possible encrypted extra information. 499 500 We opt to use a slightly modified (mostly simplified) version of this encrypted env because: 501 * `EncryptedEnv` does not support multiple keys 502 * the data prefix is not needed, all encryption fields can be stored in the registry 503 504 We will use a modified version of the existing `EncryptedEnv` without data prefix. 505 506 The encrypted env uses a `CipherStream` for each file, with the cipher stream containing the necessary 507 information to perform encryption and decryption (cipher algorithm, key, nonce, and counter). 508 509 It also holds a reference to a key manager which can provide the active key and any older keys held. 510 511 Two instances of the encrypted env are in use: 512 * store encryption env: uses store keys, used to manipulate the data keys file 513 * data encryption env: uses data keys, used to manipulate all other files 514 515 ### Key levels 516 517 We introduce two levels of encryption with their corresponding keys: 518 * data keys: 519 * used to encrypt the data itself 520 * automatically generated and rotated 521 * stored in the `COCKROACHDB_DATA_KEYS` file 522 * encrypted using the store keys, or plaintext when encryption is disabled 523 * store keys: 524 * used to encrypt the list of data keys 525 * provided by the user 526 * should be stored on a separate disk 527 * should only be accessible to the cockroach process 528 529 ### Key status 530 531 We have three distinct status for keys: 532 * active: key is being used for all new data 533 * in-use: key is still needed to read some data but is not being used for new data 534 * inactive: there is no remaining data encrypted with this key 535 536 ### Store keys files 537 538 Store keys consist of exactly two keys: the active key, and the previous key. 539 540 They are stored in separate files containing the raw key data (no encoding). 541 542 Specifying the keys in use is done through the encryption flag fields: 543 * `key`: path to the active key, or `plaintext` for plaintext. If not specified, `plaintext` is the default. 544 * `old_key`: path to the previous key, or `plaintext` for plaintext. If not specified, `plaintext` is the default. 545 546 The size of the raw key in the file dictates the cipher variant to use. Keys can be 16, 24, or 32 bytes long 547 corresponding to AES-128, AES-192, AES-256 respectively. 548 549 Key files are opened in read-only mode by cockroach. 550 551 ### Key Manager 552 553 The key manager is responsible for holding all keys used in encryption. 554 It is used by the encrypted env and provides the following interfaces: 555 * `GetActiveKey`: returns the currently active key 556 * `GetKey(key hash)`: returns the key matching the key hash, if any 557 558 We identify two types of key managers: 559 560 #### Store Key Manager 561 562 The store key manager holds the current and previous store keys as specified through the `--enterprise-encryption` 563 flag. 564 565 Since the keys are externally provided, there is no concept of key rotation. 566 567 #### Data Key Manager 568 569 The data key manager holds the dynamically-generated data keys. 570 571 Keys are persisted to the `COCKROACHDB_DATA_KEYS` file using the `write/close/rename` method and encrypted 572 through an encrypted env using the store key manager. 573 574 The manager periodically generates a new data key (see [Rotating data keys](#rotating-data-keys)), keeps 575 the previously-active key in the list of existing keys, and marks the new key as active. 576 577 Keys must be successfully persisted to the `COCKROACHDB_DATA_KEYS` file before use. 578 579 ### Rotating store keys 580 581 Rotating the store keys consists of specifying: 582 * `key` points to a new key file, or `plaintext` to switch to plaintext. 583 * `old_key` points to the key file previously used. 584 585 Upon starting (or other signal), cockroach decrypts the data keys file and re-encrypts it with the new key. 586 If rotation is done through a flag (as opposed to other signal), this is done before starting rocksdb. 587 588 An ID is computed for each key by taking the hash (`sha-256`) of the raw key. This key ID is stored in plaintext 589 to indicate which store key is used to decode the data keys file. 590 591 Any changes in active store key (actual key, key size) triggers a data key rotation. 592 593 ### Data keys file format 594 595 The data keys file is an encoded protocol buffer: 596 597 ``` 598 message DataKeysRegistry { 599 // Ordering does not matter. 600 repeated DataKey data_keys = 1; 601 repeated StoreKey store_keys = 2; 602 } 603 604 // EncryptionType is shared with the registry EncryptionType. 605 enum EncryptionType { 606 // No encryption applied. 607 Plaintext = 0; 608 // AES in counter mode. 609 AES_CTR = 1; 610 } 611 612 // Information about the store key, but not the key itself. 613 message StoreKey { 614 // The ID (hash) of this key. 615 optional bytes key_id = 1; 616 // Whether this is the active (latest key). 617 optional bool active = 2; 618 // First time this key was seen (in seconds since epoch). 619 optional int32 creation_time = 3; 620 } 621 622 // Actual data keys and related information. 623 message DataKey { 624 // The ID (hash) of this key. 625 optional bytes key_id = 1; 626 // Whether this is the active (latest) key. 627 optional bool active = 2; 628 // EncryptionType is the type of encryption (aka: cipher) used with this key. 629 EncryptionType encryption_type = 3; 630 // Creation time is the time at which the key was created (in seconds since epoch). 631 optional int32 creation_time = 4; 632 // Key is the raw key. 633 optional bytes key = 5; 634 // Was exposed is true if we ever wrote the data keys file in plaintext. 635 optional bool was_exposed = 6; 636 // ID of the active store key at creation time. 637 optional bytes creator_store_key_id = 7; 638 } 639 ``` 640 641 The `store_keys` field is needed to keep track of store key ages and statuses. We only need to keep the 642 active key but may keep previous keys for history. It does **not** store the actual key, only key hash. 643 644 The `data_keys` field contains all in-use (data encrypted with those keys is still live) keys and all information 645 needed to determine ciphers, ages, related store keys, etc... 646 647 `was_exposed` indicates whether the key was even written to disk as plaintext (encryption was disabled at the 648 store level). This will be surfaced in encryption status reports. Data encrypted by an exposed key is securely 649 as bad as `plaintext`. 650 651 `creator_store_key_id` is the ID of the active store key when this key was created. This enables two things: 652 * check the active data key's `create_store_key_id` against the active store key. Mismatch triggers rotation 653 * force re-encryption of all files encrypted up to some store key 654 655 ### Generating data keys 656 657 To generate a new data key, we look up the following: 658 * current active key 659 * current timestamp 660 * desired cipher (eg: `AES128`) 661 * current store key ID 662 663 If the cipher is other than `plaintext`, we generate a key of the desired length using the pseudorandom `CryptoPP::OS_GenerateRandomBlock(blocking=false`) (see [Random number generator](#random-number-generator) for alternatives). 664 665 We then generate the following new key entry: 666 * **key_id**: the hash (`sha256`) of the raw key 667 * **creation_time**: current time 668 * **encryption_type**: as specified 669 * **key**: raw key data 670 * **create_store_key_id**: the ID of the active store key 671 * **was_exposed**: true if the current store encryption type is `plaintext` 672 673 ### Rotating data keys 674 675 Rotation is the act of using a new key as the active encryption key. This can be due to: 676 * a new cipher is desired (including turning encryption on and off) 677 * a different key size is desired 678 * the store key was rotated 679 * rotation is needed (time based, amount of data/number of files using the current key) 680 681 When a new key has been generated (see above), we build a temporary list of data keys (using the existing 682 data keys and the new key). 683 If the current store key encryption type is `plaintext`, set `was_exposed = true` for all data keys. 684 685 We write the file with encryption to `COCKROACHDB_DATA_KEYS`. Upon successful write, we trigger a data key file reload. 686 687 We use a `write/close/rename` method to ensure correct file contents. 688 689 Key generation is done inline at startup (we may as well wait for the new key before proceeding), but in the 690 background for automated changes while the system is already running. 691 692 ### Reporting encryption status 693 694 We need to report basic information about the current status of encryption. 695 696 At the very least, we should have: 697 * log entries 698 * debug page entries per store 699 700 With the following information: 701 * user-requested encryption settings 702 * active store key ID and cipher 703 * active data key ID and cipher 704 * fraction of live data per key ID and cipher 705 706 We can report the following encryption status: 707 * `plaintext`: plaintext data 708 * `AES-<size>`: encrypted with AES (one entry for each key size) 709 * `AES-<size> EXPOSED`: encrypted, but data key was exposed at some point 710 711 Active key IDs and ciphers are known at all times. We need to log them when they change 712 (indicating successful key rotation) and propagate the information to the Go layer. 713 714 Fraction of data encoded is a bit trickier. We need to: 715 1. find all files in use 716 1. lookup their encryption status in the registry (key ID and cipher) 717 1. determine file sizes 718 1. log a summary 719 1. report back to the go layer 720 721 We can find the list of all in-use files the same way rocksdb's backup does, by calling: 722 * `rocksdb::GetLiveFiles`: retrieve the list of all files in the database 723 * `rocksdb::GetSortedWalFiles`: retrieve the sorted list of all wal files 724 725 ### Other uses of local disk 726 727 **Note: logs encryption is currently [Out of scope](#out-of-scope)** 728 729 All existing uses of local disk to process data must apply the desired encryption status. 730 731 Data tied to a specific store should use the store's rocksdb instance for encryption. 732 Data not necessarily tied to a store should be encrypted if any of the stores on the node is encrypted. 733 734 We identify some existing uses of local disk: 735 TODO(mberhault, mjibson, dan): make sure we don't miss anything. 736 737 1. temporary work space for dist SQL: written through a temporary instance of rocksdb. This data does not need 738 to be used by another rocksdb instance and does not survive node restart. We propose to use dynamically-generated 739 keys to encrypt the temporary rocksdb instance. 740 1. sideloading for restore. Local SSTables are generated using an in-memory rocksdb instance then written in go 741 to local disk. We must change this to either be written directly by rocksdb, or move encryption to Go. The former 742 is probably preferable. 743 744 In addition to making sure we cover all existing use cases, we should: 745 1. document that any other directories must **NOT** reside on the same disk as any keys used 746 1. reduce the number of entry points into rocksdb to make it harder to miss encryption setup 747 748 ### Enterprise enforcement 749 750 Gating at-rest-encryption on the presence of a valid enterprise license is problematic due to the fact that 751 we have no contact with the cluster when deciding to use encryption. 752 753 For now, we propose a reactive approach to license enforcement. When any node in the cluster uses encryption 754 (determined through node metrics) but we do not have a valid license: 755 * display a large warning on the admin UI 756 * log large messages on each encrypted node (perhaps periodically) 757 * look into "advise" or "motd" type functionality in SQL. This is rumored to be unreliable. 758 759 The overall idea is that the cluster is not negatively impacted by the lack of an enterprise license. 760 See [Enterprise feature gating](#enterprise-feature-gating) for possible alternatives. 761 762 Actual code for changes proposed here will be broken into CCL and non-CCL code: 763 * non-CCL: switching env, modified encrypted env 764 * CCL: key manager(s), ciphers 765 766 ## Drawbacks 767 768 Implementing encryption-at-rest as proposed has a few drawbacks (in no particular order): 769 770 ### Directs us towards rocksdb-level encryption 771 772 While rocksdb-level encryption does not force us to keep encryption-at-rest at this level, 773 it strongly discourages us from implementing it elsewhere. 774 775 This means that more fine-grained encryption (eg: per column) will need to fit within this 776 model or will require encryption in a completely different part of the system. 777 778 ### Lack of correctness testing of rocksdb encryption layer 779 780 The rocksdb `env_encryption` functionality is barely tested and has no known open-source uses. 781 This raises serious concerns about the correctness of the proposed approach. 782 783 We can improve testing of this functionality at the rocksdb level as well as within cockroach. 784 A testing plan must be developped and implemented to provide some assurances of correctness. 785 786 ### Complexity of configuration and monitoring 787 788 Proper use of encryption-at-rest requires a reasonable amount of user education, including 789 * proper configuration of the system (see [Configuration recommendations](#configuration-recommendations)) 790 * proper monitoring of encryption status 791 792 A lot of this falls onto proper documentation and admin UI components, but some are choices made here 793 (flag specification, logged information, surfaced encryption status). 794 795 ### No strong license enforcement 796 797 The current proposal takes a reactive approach to license enforcement: we show warnings in multiple places 798 if encryption was enabled without an enterprise license. 799 800 This is unlike our other enterprise features which simply cannot be used without a license. 801 802 There is some discussion of possible ways to solve this in 803 [Enterprise feature gating](#enterprise-feature-gating), but this is left as future improvements. 804 805 ### Non-live rocksdb files will rot 806 807 Any files not included in rocksdb's "Live files" will still be encrypted. However, due to not being rewritten, 808 they will become inaccessible as soon as the key is rotated out and GCed. 809 810 While we do not currently make use of backups, we have in the past and may again. 811 812 ### CCL code location 813 814 The enterprise-related functionality should live in CCL directories as much as possible (`pkg/ccl` for go code, 815 `c-deps/libroach/ccl` for C++ code). 816 817 However, a lot of integration is needed. Some (but far from all) examples include: 818 * new flag on the `start` command 819 * additional fields on the `StoreSpec` 820 * changes to store version logic 821 * different objects (`Env`) for `DBImpl` construction 822 * encryption status reporting in node debug pages 823 824 This makes hook-based integration of CCL functionality tricky. 825 826 Making less code CCL would simplify this. But enterprise enforcement must be taken into account. 827 828 ## Rationale and Alternatives 829 830 There are a few alternatives available in the major aspects of this design as well as in 831 specific areas. We address them all here (in no particular order): 832 833 ### Filesystem encryption 834 835 This is [Out of scope](#out-of-scope) 836 837 Filesystem encryption can be used without requiring coordination with cockroach or rocksdb. 838 While this may be an option in some environments, DBAs do not always have sufficient 839 privileges to use this or may not be willing to. 840 841 Filesystem encryption can still be used with cockroach independently of at-rest-encryption. 842 This can be a reasonable solution for non-enterprise users. 843 844 Should we choose this alternative, this entire RFC can be ignored. 845 846 ### Fine-grained encryption 847 848 This is [Out of scope](#out-of-scope) 849 850 The solution proposed here allows encryption to be enabled or not for individual rocksdb instances. 851 This may not be sufficient for fine-grained encryption. 852 853 Database and table-level encryption can be accomplished by integrating store encryption status with 854 zone configs, allowing the placement of certain databases/tables on encrypted disks. This approach is 855 rather heavy-handed and may not be suitable for all cases of database/table-level encryption. 856 857 However, this may not be sufficient for more fine-grained encryption (eg: per column). 858 It's not clear how encryption for individual keys/values would work. 859 860 ### Single level of keys 861 862 **We have settled on a two-level key structure** 863 864 The current choice of two key levels (store keys vs data keys) is debatable: 865 866 Advantages: 867 * rotating store keys is cheap: re-encrypt the list of data keys. Users can deprecated old keys quickly. 868 * a third-party system could provide us with other types of keys and not impact data encryption 869 870 Negated advantage: 871 * if the store key is compromised, we still need to re-encrypt all data quickly, this does not help 872 873 Cons: 874 * more complicated logic (we have two sets of keys to worry about) 875 * encryption status is harder to understand for users 876 877 We could instead use a single level of keys where the user-provided keys are directly used to encode the data. 878 This would simplify the logic and reporting (and user understanding). This would however make rotation slower 879 and potentially make integration with third-party services more difficult. User-provided keys would have to be 880 available until no data uses them. 881 882 ### Relationship between store and data keys 883 884 **We have settled on tied cipher/key-size specification. This can be changed easily.** 885 886 The current proposal uses the same cipher and key size for store and data keys. 887 888 Pros: 889 * more user friendly: only have to specify one cipher 890 * less chance of mistake when switching encryption on/off 891 892 Cons: 893 * it's not possible to specify a different cipher for store keys 894 895 ### Directly using the data prefix format 896 897 The previous version of this RFC proposed using the `rocksdb::EncryptedEnv` for all files, with encryption state 898 (plaintext or encrypted) and encryption fields stored in the 4KiB data prefix. 899 900 The main issues of that solution are: 901 * cannot switch existing stores to the data prefix format, requiring new stores for encryption support 902 * overhead of the encrypted env for plaintext files 903 * lack of support for multiple keys in the existing data prefix format requiring heaving modification 904 905 ## Future improvements 906 907 We break down future improvements in multiple categories: 908 * v1.0: may be not done as part of the initial implementation. Must be done for the first 909 stable release. 910 * future: possible additions to come after first stable release. 911 912 The features are listed in no particular order. 913 914 ### v1.0: a.k.a. MVP 915 916 #### Instruction set support 917 918 Crypto++ can determine support for SSE2 and AES-NI at runtime and fall back to software implementation when 919 not supported. 920 921 There are a few things we can do: 922 * ensure out builds properly enable instruction-set detection 923 * surface a warning when running in software mode 924 * properly document instruction set requirements for optimal performance 925 926 #### Forcing re-encryption 927 928 We need to find a way to force re-encryption when we want to remove an old key. 929 While rocksdb regularly creates new files, we may need to force rewrite for less-frequently 930 updated files. Other files (such as `MANIFEST`, `OPTIONS`, `CURRENT`, `IDENTITY`, etc...) may need 931 a different method to rewrite. 932 933 Compaction (of the entire key space, or specific ranges determined through live file metadata) may provide 934 the bulk of the needed functionality. 935 However, some files (especially with no updates) will not be rewritten. 936 937 Some possible solutions to investigate: 938 * there is rumor of being able to mark sstables as "dirty" 939 * patches to rocksdb to force rotation even if nothing has changed (may be the safest) 940 * "poking" at the files to add changes (may be impossible to do properly) 941 * level of indirection in the encryption layer while a file is being rewritten 942 943 Part of forcing re-encryption includes: 944 * when to do it automatically (eg: age-based. maybe after half the active key lifetime) 945 * how to do it manually (user requests quick re-encryption) 946 * specifying what to re-encrypt (eg: all data keys up to ID 5) 947 948 #### Garbage collection of old data keys 949 950 We would prefer not to keep old data keys forever, but we need to be certain that a key is no longer in use 951 before deleting it. How feasible this is depends on the accuracy of our encryption status reporting. 952 953 If we choose to ignore non-live files, garbage collection should be reasonably safe. 954 955 #### Garbage collection of registry entries 956 957 All encrypted files are stored in the registry. Live rocksdb files will automatically be removed as they are 958 deleted, but any other files will remain forever if not deleted through rocksdb. 959 960 We may want to periodically stats all files in our registry and deleted the entries for nonexistent files. 961 962 #### Performance impact 963 964 The performance impact needs to be measured for a variety of workloads and for all supported ciphers. 965 This is needed to provide some guidance to users. 966 967 Guidance on key rotation period would also be helpful. This is dependent on the rocksdb churn, so will depend 968 on the specific workload. We may want to add metrics about data churn to our encryption status reporting. 969 970 #### Propagating encrypted status 971 972 We may want to automatically mark a store as "encrypted" and make this status available to zone configuration, 973 allowing database/table placement to specify encryption status. 974 975 When to mark a store as "encrypted" is not clear. For example: can we mark it as encrypted just because encryption 976 is enabled, or should we wait until encryption usage is at 100%? 977 978 If we use the existing store attributes for this marker, we may need to add the concept of "reserved" attributes. 979 980 #### Encryption-related metrics 981 982 We can export high-level metrics about at-rest-encryption through prometheus. 983 This can include: 984 * encryption status (enabled/disabled/not-possible-on-this-store) 985 * amount of encrypted data per key ID 986 * amount of data per cipher (or plaintext) 987 * age of in-use keys 988 989 #### Reloading store keys 990 991 The current proposal only reloads store keys at node start time. 992 We can avoid restarts by triggering a refresh of the store key file when receiving a signal (eg: `SIGHUP`) or other 993 conditions (periodic refresh, admin UI endpoint, filesystem polling, etc...) 994 995 #### Tooling 996 997 At the very least, we want `cockroach debug` tools to continue working correctly with encrypted files. 998 999 We should examine which rocksdb-provided tools may need modification as well, possibly involving patches 1000 to rocksdb. 1001 1002 ### Possible future additions 1003 1004 #### Safe file deletion 1005 1006 We may want to delete old files in a less recoverable way (some filesystems allow un-delete). 1007 On SSDs, a single overwrite pass may be sufficient. We do not propose to handle safe deletion 1008 on hard drives. 1009 1010 #### Support for additional block ciphers 1011 1012 Crypto++ supports multiple block ciphers. It should be reasonably easy to add support for 1013 other ciphers. 1014 1015 #### Authenticated encryption for data integrity 1016 1017 We can switch to authenticated encryption (eg: Galois Counter Mode, or others) to allow integrity 1018 verification of files on disk. 1019 1020 Implementing authenticated encryption would require additional changes to the raw storage format 1021 to store the final authentication tag. 1022 1023 #### Add sanity checks 1024 1025 We could perform a few checks to ensure data security, such as: 1026 * detect if keys are on the same disk as the store 1027 * detect if keys have loose permissions 1028 * detect if swap is enabled 1029 1030 #### Enterprise feature gating 1031 1032 The current proposal does not gate encryption on a valid license due to the fact that we cannot check the license 1033 when initialising the node. 1034 1035 A possible solution to explore is detection when the node joins a cluster. eg: 1036 * always allow store encryption 1037 * when a node joins, communicate its encryption status and refuse the join if no enterprise license exists 1038 * on bootstrap, an encrypted store will only allow SQL operations on the system tables (to set the license) 1039 * the license can be passed through `init` 1040 1041 This would still cause issues when removing the license (or errors loading/validating the license). 1042 1043 Less drastic actions may be possible.