github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20171220_encryption_at_rest.md (about)

     1  - Feature Name: Encryption at rest
     2  - Status: in-progress
     3  - Start Date: 2017-11-01
     4  - Authors: Marc Berhault
     5  - RFC PR: [#19785](https://github.com/cockroachdb/cockroach/pull/19785)
     6  - Cockroach Issue: [#19783](https://github.com/cockroachdb/cockroach/issues/19783)
     7  
     8  
     9  Table of Contents
    10  =================
    11  
    12     * [Summary](#summary)
    13     * [Motivation](#motivation)
    14     * [Related resources](#related-resources)
    15     * [Out of scope](#out-of-scope)
    16     * [Security analysis](#security-analysis)
    17        * [Attack profiles](#attack-profiles)
    18        * [Assumptions](#assumptions)
    19        * [Considerations](#considerations)
    20     * [Guide-level explanation](#guide-level-explanation)
    21        * [Terminology](#terminology)
    22        * [User-level explanation](#user-level-explanation)
    23           * [Configuration recommendations](#configuration-recommendations)
    24           * [Store keys](#store-keys)
    25           * [Data keys](#data-keys)
    26           * [User control of encryption](#user-control-of-encryption)
    27        * [Contributor impact](#contributor-impact)
    28     * [Reference-level explanation](#reference-level-explanation)
    29        * [Detailed design](#detailed-design)
    30           * [Store version](#store-version)
    31           * [Switching Env](#switching-env)
    32           * [COCKROACHDB_REGISTRY](#cockroachdb_registry)
    33           * [Encrypted Env](#encrypted-env)
    34           * [Key levels](#key-levels)
    35           * [Key status](#key-status)
    36           * [Store keys files](#store-keys-files)
    37           * [Key Manager](#key-manager)
    38           * [Rotating store keys](#rotating-store-keys)
    39           * [Data keys file format](#data-keys-file-format)
    40           * [Generating data keys](#generating-data-keys)
    41           * [Rotating data keys](#rotating-data-keys)
    42           * [Reporting encryption status](#reporting-encryption-status)
    43           * [Other uses of local disk](#other-uses-of-local-disk)
    44           * [Enterprise enforcement](#enterprise-enforcement)
    45        * [Drawbacks](#drawbacks)
    46           * [Directs us towards rocksdb-level encryption](#directs-us-towards-rocksdb-level-encryption)
    47           * [Lack of correctness testing of rocksdb encryption layer](#lack-of-correctness-testing-of-rocksdb-encryption-layer)
    48           * [Complexity of configuration and monitoring](#complexity-of-configuration-and-monitoring)
    49           * [No strong license enforcement](#no-strong-license-enforcement)
    50           * [Non-live rocksdb files will rot](#non-live-rocksdb-files-will-rot)
    51           * [CCL code location](#ccl-code-location)
    52        * [Rationale and Alternatives](#rationale-and-alternatives)
    53           * [Filesystem encryption](#filesystem-encryption)
    54           * [Fine-grained encryption](#fine-grained-encryption)
    55           * [Single level of keys](#single-level-of-keys)
    56           * [Relationship between store and data keys](#relationship-between-store-and-data-keys)
    57           * [Directly using the data prefix format](#directly-using-the-data-prefix-format)
    58        * [Future improvements](#future-improvements)
    59           * [v1.0: a.k.a. MVP](#v10-aka-mvp)
    60           * [Possible future additions](#possible-future-additions)
    61  
    62  # Summary
    63  
    64  This feature is Enterprise.
    65  
    66  We propose to add support for encryption at rest on cockroach nodes, with
    67  encryption being done at the rocksdb layer for each file.
    68  
    69  We provide CTR-mode AES encryption for all files written through rocksdb.
    70  
    71  Keys are split into user-provided store keys and dynamically-generated data keys.
    72  Store keys are used to encrypt the data keys. Data keys are used to encrypt the actual data.
    73  Store keys can be rotated at the user's discretion. Data keys can be rotated automatically
    74  on a regular schedule, relying on rocksdb churn to re-encrypt data.
    75  
    76  Plaintext files go through the regular rocksdb interface to the filesystem. Encrypted files
    77  go through an intermediate layer responsible for all encryption tasks.
    78  
    79  Data can be transitioned from plaintext to encrypted and back with status being reported
    80  continuously.
    81  
    82  # Motivation
    83  
    84  Encryption is desired for security reasons (prevent access from other users on the same
    85  machine, prevent data leak through drive theft/disposal) as well as regulatory reasons
    86  (GDPR, HIPAA, PCI DSS).
    87  
    88  Encryption at rest is necessary when other methods of encryption are either not desirable,
    89  or not sufficient (eg: filesystem-level encryption cannot be used if DBAs do not have
    90  access to filesystem encryption utilities).
    91  
    92  # Related resources
    93  
    94  * [Crypto++](https://www.cryptopp.com/)
    95  * [overview of block cipher modes](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Common_modes)
    96  * [rocksdb PR adding env_encryption](https://github.com/facebook/rocksdb/pull/2424)
    97  * [SEI Cert C coding standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard)
    98  
    99  # Out of scope
   100  
   101  The following are not in scope but should not be hindered by implementation of this RFC:
   102  * encryption of non-rocksdb data (eg: log files)
   103  * integration with external key storage systems such as Vault, AWS KMS, KeyWhiz
   104  * auditing of key usage and encryption status
   105  * integration with HSM (hardware security module) or TPM (Trusted Platform Module)
   106  * FIPS-140-2 compliance
   107  See [Possible future additions](#possible-future-additions) for more currently-out-of-scope features.
   108  
   109  The following are unrelated to encryption-at-rest as currently proposed:
   110  * encrypted backup (should be supported regardless of encryption-at-rest status)
   111  * fine-granularity encryption (that cannot use zone configs to select encrypted replicas)
   112  * restricting data processing on encrypted nodes (requires planning/gateway coordination)
   113  
   114  # Security analysis
   115  
   116  Caveat: this is not a thorough security analysis of the proposed solution, let alone its implementation.
   117  
   118  This section should be expanded and studied carefully before this RFC is approved.
   119  
   120  ## Attack profiles
   121  
   122  The goal of this feature is to block two attack vectors:
   123  
   124  ### Access to raw disk offline
   125  
   126  An attacker can gain access to the disk after it has been removed from the system (eg: node decommission).
   127  At-rest encryption should make all data on the disk useless if the following are true:
   128  * none of the store keys are available or previously compromised
   129  * none of the data went through a phase where either store or data encryption was `plaintext`
   130  
   131  ### Access to a running system by unprivileged user
   132  
   133  Unprivileged users (eg: non root) should not be able to extract cockroach data even if they have access to the
   134  raw rocksdb files.
   135  This will still not guard against:
   136  * privileged users (with access to store keys or memory)
   137  * data that was at some point stored as `plaintext`
   138  
   139  ## Assumptions
   140  
   141  Some of the assumptions here can be verified by runtime checks, but others must be satisfied by the user (see
   142  [Configuration Recommendation](#configuration-recommendations).
   143  
   144  ### No privileged access
   145  
   146  We assume attackers do not have privileged access on a running system. Specifically:
   147  * store keys cannot be read
   148  * cockroach memory cannot be directly accessed
   149  * command line flags cannot be modified
   150  
   151  ### No write access by attackers
   152  
   153  A big assumption in this document is that attackers do not have write access to the raw files while
   154  we are operating: we trust the integrity of the store and data key files as well as all data written on disk.
   155  
   156  This includes the case of an attacker removing a disk, modifying it, and re-inserting it into the cluster.
   157  
   158  A potential future improvement is to use authenticated encryption to verify the integrity of files on disk.
   159  This would add complexity and cost to filesystem-level operations in rocksdb as we would need to read entire
   160  files to compute authentication tags.
   161  
   162  However, integrity checking can be cheaply used on the data keys file.
   163  
   164  ## Considerations
   165  
   166  ### Random number generator
   167  
   168  We need to generate random values for a few things:
   169  * data keys
   170  * nonce/counter for each file
   171  
   172  Crypto++ provides [OS_GenerateRandomBlock](https://www.cryptopp.com/wiki/RandomNumberGenerator#OS_Entropy)
   173  which can operate in blocking (using `/dev/random`) or non-blocking (using `/dev/urandom`) mode.
   174  We would prefer to use better entropy for data keys, but `/dev/random` is notoriously slow especially
   175  when just starting rocksdb with very little disk/network utilization.
   176  
   177  Generating data keys (other than the first one, or when changing encryption ciphers) can be done
   178  in the background so we may be able to use the higher entropy `/dev/random`.
   179  nonces may be safe to keep generating using the lower-entropy `/dev/urandom`.
   180  
   181  More research must be done into the use of `/dev/random` in multi-user environment. For example, is it possible
   182  for an attacked to consume `/dev/random` for long enough that key generation is effectively disabled?
   183  
   184  ### IV makeup and reuse prevention
   185  
   186  An important consideration in AES-CTR is making sure we never reuse the same IV for a given key.
   187  
   188  The IV has a size of `AES::BlockSize`, or 128 bits. It is made of two parts:
   189  * nonce: 96 bits, randomly generated for each file
   190  * counter: 32 bits, incremented for each block in the file
   191  
   192  This imposes two limits:
   193  * maximum file size: `2^32 128-bit blocks == 64GiB`
   194  * probability of nonce re-use after `2^32` files is `2^-32`
   195  
   196  These limits should be sufficient for our needs.
   197  
   198  ### Safety of symmetric key hashes
   199  
   200  Given a reasonably safe hashing algorithm, exposing the hash of the store keys should not be an issue.
   201  
   202  Indeed, finding collisions in `sha256` is not currently easier than cracking `aes128`. Should better collision
   203  methods be found, this is still not the key itself.
   204  
   205  ### Memory safety
   206  
   207  We need to provide safety for the keys while held in memory.
   208  At the C++ level, we can control two aspects:
   209  * don't swap to disk: using `mlock` (`man mlock(2)`) on memory holding keys, preventing paging out to disk
   210  * don't core dump: using `madvise` with `MADV_DONTDUMP` (see `man madvise(2)` on Linux) to exclude pages from core dumps.
   211  
   212  There is no equivalent in Go so the current approach is to avoid loading keys in Go.
   213  This can become problematic if we want to reuse the keys to encrypt log files written in Go.
   214  No good answer presents itself.
   215  
   216  # Guide-level explanation
   217  
   218  ## Terminology
   219  
   220  Terminology used in this RFC:
   221  * **data key**: a.k.a Data-encryption-key. Used to encrypt the actual on-disk data. These are generated automatically.
   222  * **store key**: a.k.a. Key-encryption-key. Used to encrypt the set of data keys. Provided by the user.
   223  * **active key**: the key being used to encrypt new data.
   224  * **key rotation**: encrypting data with a new key. Rotation starts when the new key is provided and ends when no data encrypted with the old key remains.
   225  * **plaintext**: unencrypted data.
   226  * **Env**: rocksdb terminology for the layer between rocksdb and the filesystem.
   227  * **Switching Env**: our new Env that can switch between plaintext and encrypted envs.
   228  
   229  ## User-level explanation
   230  
   231  Encryption-at-rest is an optional feature that can be enabled on a per-store basis.
   232  
   233  In order to enable encryption on a given store, the user needs two things:
   234  * an enterprise license
   235  * one or more store key(s)
   236  
   237  Enabling encryption increases the store version, making downgrade to a binary before encryption impossible.
   238  
   239  ### Configuration recommendations
   240  
   241  We identify a few configuration requirements for users to safely use encryption at rest.
   242  
   243  **TODO**: this will need to be fleshed out when writing the docs.
   244  
   245  * restricted access to store keys (ideally, only the cockroach user, and read-only access)
   246  * store keys and cockroach data must not be on the same filesystem/disk (including temporary working directories)
   247  * restricted access to all cockroach data
   248  * disable swap
   249  * don't enable core dumps
   250  * reasonable key generation/rotation
   251  * monitoring
   252  * ideally, the store keys are not stored on the machine (use something like `keywhiz`)
   253  
   254  ### Store keys
   255  
   256  The store key is a symmetric key provided by the user. It has the following properties:
   257  * unique for each store
   258  * available only to the cockroach process on the node
   259  * not stored on the same disk as the cockroach data
   260  
   261  Store keys are stored in raw format in files (one file per key).
   262  eg: to generate a 128-bit key: `openssl rand 16 > store.key`
   263  
   264  Specifying store keys is done through the `--enterprise-encryption` flag. There are two key fields in this flag:
   265  * `key`: path to the active store key, or `plain` for plaintext (default).
   266  * `old_key`: path to the previous store key, or `plain` for plaintext (default).
   267  
   268  When a new `key` is specified, we must tell cockroach what the previous active key was through `old_key`.
   269  
   270  ### Data keys
   271  
   272  Data keys are automatically generated by cockroach. They are stored in the data directory and
   273  encrypted with the active store key. Data keys are used to encrypt the actual files inside the data
   274  directory.
   275  
   276  This two-level approach allows easy rotation of store keys and provides safer encryption of large amounts of
   277  data. To rotate the store key, all we need to do is re-encrypt the file containing the data keys, leaving
   278  the bulk of the data as is.
   279  
   280  Data keys are generated and rotated by cockroach.
   281  There are two parameters controlling how data keys behave:
   282  * encryption cipher: the cipher in use for data encryption. The cipher is currently `AES CTR` with the same key
   283  size as the store key.
   284  * rotation period: the time before a new key is generated and used. Default value: 1 week. This can be set through a flag.
   285  
   286  ### User control of encryption
   287  
   288  #### Recommended production configuration
   289  
   290  The need for encryption entails a few recommended changes in production configuration:
   291  * disable swap/core dumps: we want to avoid any data hitting disk unencrypted, this includes memory being swapped out.
   292  * run on architectures that support the [AES-NI instruction set](https://en.wikipedia.org/wiki/AES_instruction_set).
   293  * have a separate area (encrypted or in-memory partition, fuse-filesystem, etc...) to store the store-level keys.
   294  
   295  #### Flag changes for the cockroach binary
   296  
   297  We add a new flag for CCL binaries. It must be specified for each store we wish encrypted:
   298  ```
   299  --enterprise-encryption=path=<path to store>,key=<path to key file>,old_key=<path to old key>,rotation_period=<duration>
   300  ```
   301  
   302  The individual fields are:
   303  * `path`: the path to the data directory of the corresponding store. This must match the path specified in `--store`
   304  * `key`: the path to the current encryption key, or `plaintext` if we wish to use plaintext. default: `plaintext`
   305  * `old_key`: the path to the previous encryption key. Only needed if data was already encrypted.
   306  * `rotation_period`: how often data keys should be rotated. default: `1 week`
   307  
   308  The flag can be specified multiple times, once for each store.
   309  
   310  The encryption flags can specify different encryption states for different stores (eg: one encrypted one plain,
   311  different rotation periods).
   312  
   313  #### Enabling encryption on a store
   314  
   315  Turning on encryption for a new store or a store currently in plaintext involves the following:
   316  
   317  ```
   318  # Ensure your key file exists and has valid key data (correct size)
   319  # For example, to generate a key for AES-128:
   320  $ openssl rand 16 > /path/to/cockroach.key
   321  # Specify the enterprise-encryption:
   322  $ cockroach start <regular options> \
   323      --store=/mnt/data \
   324      --enterprise-encryption=path=/mnt/data,key=/path/to/cockroach.key
   325  ```
   326  
   327  The node will generate a 128 bit data key, encrypt the list of data keys with the store key, and use AES128
   328  encryption for all new files.
   329  
   330  Examine the logs or node debug pages to see that encryption is now enabled and see its progress.
   331  
   332  #### Rotating the store key
   333  
   334  Given the previous configuration, we can generate a new store key. We must pass the previous key.
   335  
   336  ```
   337  # Create a new 128 bit key.
   338  $ openssl rand 16 > /path/to/cockroach.new.key
   339  # Tell cockroach about the new key, and pass the old key (/path/to/cockroach.key)
   340  $ cockroach start <regular options> \
   341      --store=/mnt/data \
   342      --enterprise-encryption=path=/mnt/data,key=/path/to/cockroach.new.key,old_key=/path/to/cockroach.key
   343  ```
   344  
   345  Examine the logs or node debug pages to see that the new key is now in use.
   346  It is now safe to delete the old key file.
   347  
   348  #### Disabling encryption
   349  
   350  We can switch an encrypted store back plaintext. This is done by using the special value `plaintext` in the
   351  `key` field of the encryption flag. We need to specify the previous encryption key.
   352  
   353  ```
   354  # Instead of a key file, use "plaintext" as the argument.
   355  # Pass the old key to allow decrypting existing data.
   356  $ cockroach start <regular options> \
   357      --store=/mnt/data \
   358      --enterprise-encryption=path=/mnt/data,key=plain,old_key=/path/to/cockroach.new.keys
   359  ```
   360  
   361  Examine the logs or node debug pages to see that the store encryption status is now plaintext. It is now safe to delete the old key file.
   362  
   363  Examine logs and debug pages to see progress of data encryption. This may take some time.
   364  
   365  ## Contributor impact
   366  
   367  The biggest impact of this change on contributors is the fact that all data on a given store must be encrypted.
   368  
   369  There are three main categories:
   370  * using the store rocksdb instance: encryption is done automatically
   371  * using a separate rocksdb instance: encryption settings **must** be given to the new instance. Care must be taken to ensure that users know not to place store keys on the same disks as the rocksdb directory
   372  * using anything other than rocksdb: logs (written at the Go level) are marked out of scope for this document. However, any raw data written to disk should use the same encryption settings as the store
   373  
   374  # Reference-level explanation
   375  
   376  ## Detailed design
   377  
   378  ### Store version
   379  
   380  We introduce a new [store version](https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/engine/version.go#L27) to mark switching to stores supporting encryption.
   381  
   382  Stores are currently using `versionBeta20160331`. If no encryption flags are specified, we remain at this
   383  version until a "reasonable" time (one or two minor stable releases) has passed.
   384  
   385  Specifying the `--enterprise-encryption` flag increases the version to `versionSwitchingEnv`. Downgrades to
   386  binaries that do not support this version is not possible.
   387  
   388  ### Switching Env
   389  
   390  Rocksdb performs filesystem-level operations through an [`Env`](https://github.com/facebook/rocksdb/blob/master/include/rocksdb/env.h).
   391  
   392  This layer can be used to provide different behavior for a number of reasons. For example:
   393  * posix support: the default `Env`
   394  * in-memory support: for testing or in-memory databases
   395  * hdfs: for HDFS-backed rocksdb instances
   396  * encryption: for file-level encryption with encryption settings stored in a 4KB data prefix
   397  * wrapper: can override specific methods, the rest are passed through to a `base env`
   398  
   399  We leverage the `Env` layer to implement the following behavior:
   400  * stores at `versionBeta20160331` continue to use the default `Env`
   401  * stores at `versionSwitchingEnv` use the switching env
   402  * plaintext files under version `versionSwitchingEnv` use a default `Env`
   403  * encrypted files under version `versionSwitchingEnv` use an `EncryptedEnv`
   404  
   405  ```
   406  versionBeta20160331:    DefaultEnv
   407  
   408  versionSwitchingEnv:    SwitchingEnv: Encrypted? no  -----> DefaultEnv
   409                                                   yes -----> EncryptedEnv
   410  ```
   411  
   412  The state of a file (plaintext or encrypted) is stored in a file registry. This records the list of all
   413  encrypted files by filename and is persisted to disk in a file named `COCKROACHDB_REGISTRY`.
   414  
   415  For every file being operated on, the switching env must lookup its existing encryption state in the registry or the
   416  desired encryption state for new files. If the file is plaintext, pass the operation down to the `DefaultEnv`.
   417  If the file is encrypted, pass the operation down to the `EncryptedEnv`. For a new file, we must successfully
   418  persist its state in the registry before proceeding with the operation.
   419  
   420  Most `SwitchingEnv` methods will perform something like the following:
   421  ```
   422  OpOnFile(filename)
   423    // Determine whether the file uses encryption (existing files) or encryption is desired (new files)
   424    if !registry.HasFile(filename)
   425      useEncryption = lookup desired encryption (from --enterprise-encryption flag)
   426      add filename to registry
   427      persist registry to disk. Error out on failure.
   428    else
   429      useEncryption = get file encryption state from registry
   430  
   431    // Perform the operation through the appropriate Env.
   432    if useEncryption
   433      EncryptedEnv->OpOnFile(filename)
   434    else
   435      DefaultEnv->OpOnFile(filename)
   436  ```
   437  
   438  The registry may accumulate non-existent entries if writes fail after addition or removal fails after deletes.
   439  It will also gather entries that are never deleted by rocksdb (eg: archives). We can clean these up
   440  by adding a periodic [garbage collection](#garbage-collection-of-registry-entries).
   441  
   442  ### COCKROACHDB_REGISTRY
   443  
   444  The registry is a new file containing encryption status information for files written through rocksdb.
   445  This is similar to rocksdb's `MANIFEST`. We intentionally do not call it manifest to avoid confusion.
   446  
   447  It is stored in the base rocksdb directory for the store and written using a `write/close/rename` method.
   448  It is always operated on through the `DefaultEnv`.
   449  
   450  Encrypted files are always present in the registry. Plaintext files are not registered as we cannot guarantee
   451  their presence when operating on an existing store.
   452  
   453  `Env` operations on files will use the registry in different ways:
   454  * existing file: lookup its encryption state in the registry, assume plaintext if missing
   455  * existing file if it exists, otherwise new file: lookup its encryption state in the registry. If missing, stat the file through the `DefaultEnv`. If it does not exist, see "create a new file"
   456  * create a new file: lookup the desired encryption state. If encrypted, persist it in the registry
   457  
   458  The registry is a serialized protocol buffer:
   459  ```
   460  enum EncryptionRegistryVersion {
   461    // The only version so far.
   462    Base = 0;
   463  }
   464  
   465  message EncryptionRegistry {
   466    // version is currently always Base.
   467    int version = 1;
   468    repeated EncryptedFile files = 2;
   469  }
   470  
   471  enum EncryptionType {
   472    // No encryption applied, not used for the registry.
   473    Plaintext = 0;
   474    // AES in counter mode.
   475    AES_CTR = 1;
   476  }
   477  
   478  message EncryptedFile {
   479    Filename string = 1;
   480    // The type of encryption applied.
   481    EncryptionType type = 2;
   482  
   483    // Encryption fields. This may move to a separate AES-CTR message.
   484    // ID (hash) of the key in use, if any.
   485    optional bytes key_id = 3;
   486    // Initialization vector, of size 96 bits (12 bytes) for AES.
   487    optional bytes nonce = 4;
   488    // Counter, allowing 2^32 blocks per file, so 64GiB.
   489    optional uint32 counter = 5;
   490  }
   491  ```
   492  
   493  The registry contains all information needed to find the encryption key used for a given file and encrypt/decrypt it.
   494  
   495  ### Encrypted Env
   496  
   497  Rocksdb has an `EncryptedEnv` introduced in [PR 2424](https://github.com/facebook/rocksdb/pull/2424).
   498  It adds a 4KiB data block at the beginning of each file with a nonce and possible encrypted extra information.
   499  
   500  We opt to use a slightly modified (mostly simplified) version of this encrypted env because:
   501  * `EncryptedEnv` does not support multiple keys
   502  * the data prefix is not needed, all encryption fields can be stored in the registry
   503  
   504  We will use a modified version of the existing `EncryptedEnv` without data prefix.
   505  
   506  The encrypted env uses a `CipherStream` for each file, with the cipher stream containing the necessary
   507  information to perform encryption and decryption (cipher algorithm, key, nonce, and counter).
   508  
   509  It also holds a reference to a key manager which can provide the active key and any older keys held.
   510  
   511  Two instances of the encrypted env are in use:
   512  * store encryption env: uses store keys, used to manipulate the data keys file
   513  * data encryption env: uses data keys, used to manipulate all other files
   514  
   515  ### Key levels
   516  
   517  We introduce two levels of encryption with their corresponding keys:
   518  * data keys:
   519  	* used to encrypt the data itself
   520  	* automatically generated and rotated
   521  	* stored in the `COCKROACHDB_DATA_KEYS` file
   522  	* encrypted using the store keys, or plaintext when encryption is disabled
   523  * store keys:
   524  	* used to encrypt the list of data keys
   525  	* provided by the user
   526  	* should be stored on a separate disk
   527  	* should only be accessible to the cockroach process
   528  
   529  ### Key status
   530  
   531  We have three distinct status for keys:
   532  * active: key is being used for all new data
   533  * in-use: key is still needed to read some data but is not being used for new data
   534  * inactive: there is no remaining data encrypted with this key
   535  
   536  ### Store keys files
   537  
   538  Store keys consist of exactly two keys: the active key, and the previous key.
   539  
   540  They are stored in separate files containing the raw key data (no encoding).
   541  
   542  Specifying the keys in use is done through the encryption flag fields:
   543  * `key`: path to the active key, or `plaintext` for plaintext. If not specified, `plaintext` is the default.
   544  * `old_key`: path to the previous key, or `plaintext` for plaintext. If not specified, `plaintext` is the default.
   545  
   546  The size of the raw key in the file dictates the cipher variant to use. Keys can be 16, 24, or 32 bytes long
   547  corresponding to AES-128, AES-192, AES-256 respectively.
   548  
   549  Key files are opened in read-only mode by cockroach.
   550  
   551  ### Key Manager
   552  
   553  The key manager is responsible for holding all keys used in encryption.
   554  It is used by the encrypted env and provides the following interfaces:
   555  * `GetActiveKey`: returns the currently active key
   556  * `GetKey(key hash)`: returns the key matching the key hash, if any
   557  
   558  We identify two types of key managers:
   559  
   560  #### Store Key Manager
   561  
   562  The store key manager holds the current and previous store keys as specified through the `--enterprise-encryption`
   563  flag.
   564  
   565  Since the keys are externally provided, there is no concept of key rotation.
   566  
   567  #### Data Key Manager
   568  
   569  The data key manager holds the dynamically-generated data keys.
   570  
   571  Keys are persisted to the `COCKROACHDB_DATA_KEYS` file using the `write/close/rename` method and encrypted
   572  through an encrypted env using the store key manager.
   573  
   574  The manager periodically generates a new data key (see [Rotating data keys](#rotating-data-keys)), keeps
   575  the previously-active key in the list of existing keys, and marks the new key as active.
   576  
   577  Keys must be successfully persisted to the `COCKROACHDB_DATA_KEYS` file before use.
   578  
   579  ### Rotating store keys
   580  
   581  Rotating the store keys consists of specifying:
   582  * `key` points to a new key file, or `plaintext` to switch to plaintext.
   583  * `old_key` points to the key file previously used.
   584  
   585  Upon starting (or other signal), cockroach decrypts the data keys file and re-encrypts it with the new key.
   586  If rotation is done through a flag (as opposed to other signal), this is done before starting rocksdb.
   587  
   588  An ID is computed for each key by taking the hash (`sha-256`) of the raw key. This key ID is stored in plaintext
   589  to indicate which store key is used to decode the data keys file.
   590  
   591  Any changes in active store key (actual key, key size) triggers a data key rotation.
   592  
   593  ### Data keys file format
   594  
   595  The data keys file is an encoded protocol buffer:
   596  
   597  ```
   598  message DataKeysRegistry {
   599    // Ordering does not matter.
   600    repeated DataKey data_keys = 1;
   601    repeated StoreKey store_keys = 2;
   602  }
   603  
   604  // EncryptionType is shared with the registry EncryptionType.
   605  enum EncryptionType {
   606    // No encryption applied.
   607    Plaintext = 0;
   608    // AES in counter mode.
   609    AES_CTR = 1;
   610  }
   611  
   612  // Information about the store key, but not the key itself.
   613  message StoreKey {
   614    // The ID (hash) of this key.
   615    optional bytes key_id = 1;
   616    // Whether this is the active (latest key).
   617    optional bool active = 2;
   618    // First time this key was seen (in seconds since epoch).
   619    optional int32 creation_time = 3;
   620  }
   621  
   622  // Actual data keys and related information.
   623  message DataKey {
   624    // The ID (hash) of this key.
   625    optional bytes key_id = 1;
   626    // Whether this is the active (latest) key.
   627    optional bool active = 2;
   628    // EncryptionType is the type of encryption (aka: cipher) used with this key.
   629    EncryptionType encryption_type = 3;
   630    // Creation time is the time at which the key was created (in seconds since epoch).
   631    optional int32 creation_time = 4;
   632    // Key is the raw key.
   633    optional bytes key = 5;
   634    // Was exposed is true if we ever wrote the data keys file in plaintext.
   635    optional bool was_exposed = 6;
   636    // ID of the active store key at creation time.
   637    optional bytes creator_store_key_id = 7;
   638  }
   639  ```
   640  
   641  The `store_keys` field is needed to keep track of store key ages and statuses. We only need to keep the
   642  active key but may keep previous keys for history. It does **not** store the actual key, only key hash.
   643  
   644  The `data_keys` field contains all in-use (data encrypted with those keys is still live) keys and all information
   645  needed to determine ciphers, ages, related store keys, etc...
   646  
   647  `was_exposed` indicates whether the key was even written to disk as plaintext (encryption was disabled at the
   648  store level). This will be surfaced in encryption status reports. Data encrypted by an exposed key is securely
   649  as bad as `plaintext`.
   650  
   651  `creator_store_key_id` is the ID of the active store key when this key was created. This enables two things:
   652  * check the active data key's `create_store_key_id` against the active store key. Mismatch triggers rotation
   653  * force re-encryption of all files encrypted up to some store key
   654  
   655  ### Generating data keys
   656  
   657  To generate a new data key, we look up the following:
   658  * current active key
   659  * current timestamp
   660  * desired cipher (eg: `AES128`)
   661  * current store key ID
   662  
   663  If the cipher is other than `plaintext`, we generate a key of the desired length using the pseudorandom `CryptoPP::OS_GenerateRandomBlock(blocking=false`) (see [Random number generator](#random-number-generator) for alternatives).
   664  
   665  We then generate the following new key entry:
   666  * **key_id**: the hash (`sha256`) of the raw key
   667  * **creation_time**: current time
   668  * **encryption_type**: as specified
   669  * **key**: raw key data
   670  * **create_store_key_id**: the ID of the active store key
   671  * **was_exposed**: true if the current store encryption type is `plaintext`
   672  
   673  ### Rotating data keys
   674  
   675  Rotation is the act of using a new key as the active encryption key. This can be due to:
   676  * a new cipher is desired (including turning encryption on and off)
   677  * a different key size is desired
   678  * the store key was rotated
   679  * rotation is needed (time based, amount of data/number of files using the current key)
   680  
   681  When a new key has been generated (see above), we build a temporary list of data keys (using the existing
   682  data keys and the new key).
   683  If the current store key encryption type is `plaintext`, set `was_exposed = true` for all data keys.
   684  
   685  We write the file with encryption to `COCKROACHDB_DATA_KEYS`. Upon successful write, we trigger a data key file reload.
   686  
   687  We use a `write/close/rename` method to ensure correct file contents.
   688  
   689  Key generation is done inline at startup (we may as well wait for the new key before proceeding), but in the
   690  background for automated changes while the system is already running.
   691  
   692  ### Reporting encryption status
   693  
   694  We need to report basic information about the current status of encryption.
   695  
   696  At the very least, we should have:
   697  * log entries
   698  * debug page entries per store
   699  
   700  With the following information:
   701  * user-requested encryption settings
   702  * active store key ID and cipher
   703  * active data key ID and cipher
   704  * fraction of live data per key ID and cipher
   705  
   706  We can report the following encryption status:
   707  * `plaintext`: plaintext data
   708  * `AES-<size>`: encrypted with AES (one entry for each key size)
   709  * `AES-<size> EXPOSED`: encrypted, but data key was exposed at some point
   710  
   711  Active key IDs and ciphers are known at all times. We need to log them when they change
   712  (indicating successful key rotation) and propagate the information to the Go layer.
   713  
   714  Fraction of data encoded is a bit trickier. We need to:
   715  1. find all files in use
   716  1. lookup their encryption status in the registry (key ID and cipher)
   717  1. determine file sizes
   718  1. log a summary
   719  1. report back to the go layer
   720  
   721  We can find the list of all in-use files the same way rocksdb's backup does, by calling:
   722  * `rocksdb::GetLiveFiles`: retrieve the list of all files in the database
   723  * `rocksdb::GetSortedWalFiles`: retrieve the sorted list of all wal files
   724  
   725  ### Other uses of local disk
   726  
   727  **Note: logs encryption is currently [Out of scope](#out-of-scope)**
   728  
   729  All existing uses of local disk to process data must apply the desired encryption status.
   730  
   731  Data tied to a specific store should use the store's rocksdb instance for encryption.
   732  Data not necessarily tied to a store should be encrypted if any of the stores on the node is encrypted.
   733  
   734  We identify some existing uses of local disk:
   735  TODO(mberhault, mjibson, dan): make sure we don't miss anything.
   736  
   737  1. temporary work space for dist SQL: written through a temporary instance of rocksdb. This data does not need
   738  to be used by another rocksdb instance and does not survive node restart. We propose to use dynamically-generated
   739  keys to encrypt the temporary rocksdb instance.
   740  1. sideloading for restore. Local SSTables are generated using an in-memory rocksdb instance then written in go
   741  to local disk. We must change this to either be written directly by rocksdb, or move encryption to Go. The former
   742  is probably preferable.
   743  
   744  In addition to making sure we cover all existing use cases, we should:
   745  1. document that any other directories must **NOT** reside on the same disk as any keys used
   746  1. reduce the number of entry points into rocksdb to make it harder to miss encryption setup
   747  
   748  ### Enterprise enforcement
   749  
   750  Gating at-rest-encryption on the presence of a valid enterprise license is problematic due to the fact that
   751  we have no contact with the cluster when deciding to use encryption.
   752  
   753  For now, we propose a reactive approach to license enforcement. When any node in the cluster uses encryption
   754  (determined through node metrics) but we do not have a valid license:
   755  * display a large warning on the admin UI
   756  * log large messages on each encrypted node (perhaps periodically)
   757  * look into "advise" or "motd" type functionality in SQL. This is rumored to be unreliable.
   758  
   759  The overall idea is that the cluster is not negatively impacted by the lack of an enterprise license.
   760  See [Enterprise feature gating](#enterprise-feature-gating) for possible alternatives.
   761  
   762  Actual code for changes proposed here will be broken into CCL and non-CCL code:
   763  * non-CCL: switching env, modified encrypted env
   764  * CCL: key manager(s), ciphers
   765  
   766  ## Drawbacks
   767  
   768  Implementing encryption-at-rest as proposed has a few drawbacks (in no particular order):
   769  
   770  ### Directs us towards rocksdb-level encryption
   771  
   772  While rocksdb-level encryption does not force us to keep encryption-at-rest at this level,
   773  it strongly discourages us from implementing it elsewhere.
   774  
   775  This means that more fine-grained encryption (eg: per column) will need to fit within this
   776  model or will require encryption in a completely different part of the system.
   777  
   778  ### Lack of correctness testing of rocksdb encryption layer
   779  
   780  The rocksdb `env_encryption` functionality is barely tested and has no known open-source uses.
   781  This raises serious concerns about the correctness of the proposed approach.
   782  
   783  We can improve testing of this functionality at the rocksdb level as well as within cockroach.
   784  A testing plan must be developped and implemented to provide some assurances of correctness.
   785  
   786  ### Complexity of configuration and monitoring
   787  
   788  Proper use of encryption-at-rest requires a reasonable amount of user education, including
   789  * proper configuration of the system (see [Configuration recommendations](#configuration-recommendations))
   790  * proper monitoring of encryption status
   791  
   792  A lot of this falls onto proper documentation and admin UI components, but some are choices made here
   793  (flag specification, logged information, surfaced encryption status).
   794  
   795  ### No strong license enforcement
   796  
   797  The current proposal takes a reactive approach to license enforcement: we show warnings in multiple places
   798  if encryption was enabled without an enterprise license.
   799  
   800  This is unlike our other enterprise features which simply cannot be used without a license.
   801  
   802  There is some discussion of possible ways to solve this in
   803  [Enterprise feature gating](#enterprise-feature-gating), but this is left as future improvements.
   804  
   805  ### Non-live rocksdb files will rot
   806  
   807  Any files not included in rocksdb's "Live files" will still be encrypted. However, due to not being rewritten,
   808  they will become inaccessible as soon as the key is rotated out and GCed.
   809  
   810  While we do not currently make use of backups, we have in the past and may again.
   811  
   812  ### CCL code location
   813  
   814  The enterprise-related functionality should live in CCL directories as much as possible (`pkg/ccl` for go code,
   815  `c-deps/libroach/ccl` for C++ code).
   816  
   817  However, a lot of integration is needed. Some (but far from all) examples include:
   818  * new flag on the `start` command
   819  * additional fields on the `StoreSpec`
   820  * changes to store version logic
   821  * different objects (`Env`) for `DBImpl` construction
   822  * encryption status reporting in node debug pages
   823  
   824  This makes hook-based integration of CCL functionality tricky.
   825  
   826  Making less code CCL would simplify this. But enterprise enforcement must be taken into account.
   827  
   828  ## Rationale and Alternatives
   829  
   830  There are a few alternatives available in the major aspects of this design as well as in
   831  specific areas. We address them all here (in no particular order):
   832  
   833  ### Filesystem encryption
   834  
   835  This is [Out of scope](#out-of-scope)
   836  
   837  Filesystem encryption can be used without requiring coordination with cockroach or rocksdb.
   838  While this may be an option in some environments, DBAs do not always have sufficient
   839  privileges to use this or may not be willing to.
   840  
   841  Filesystem encryption can still be used with cockroach independently of at-rest-encryption.
   842  This can be a reasonable solution for non-enterprise users.
   843  
   844  Should we choose this alternative, this entire RFC can be ignored.
   845  
   846  ### Fine-grained encryption
   847  
   848  This is [Out of scope](#out-of-scope)
   849  
   850  The solution proposed here allows encryption to be enabled or not for individual rocksdb instances.
   851  This may not be sufficient for fine-grained encryption.
   852  
   853  Database and table-level encryption can be accomplished by integrating store encryption status with
   854  zone configs, allowing the placement of certain databases/tables on encrypted disks. This approach is
   855  rather heavy-handed and may not be suitable for all cases of database/table-level encryption.
   856  
   857  However, this may not be sufficient for more fine-grained encryption (eg: per column).
   858  It's not clear how encryption for individual keys/values would work.
   859  
   860  ### Single level of keys
   861  
   862  **We have settled on a two-level key structure**
   863  
   864  The current choice of two key levels (store keys vs data keys) is debatable:
   865  
   866  Advantages:
   867  * rotating store keys is cheap: re-encrypt the list of data keys. Users can deprecated old keys quickly.
   868  * a third-party system could provide us with other types of keys and not impact data encryption
   869  
   870  Negated advantage:
   871  * if the store key is compromised, we still need to re-encrypt all data quickly, this does not help
   872  
   873  Cons:
   874  * more complicated logic (we have two sets of keys to worry about)
   875  * encryption status is harder to understand for users
   876  
   877  We could instead use a single level of keys where the user-provided keys are directly used to encode the data.
   878  This would simplify the logic and reporting (and user understanding). This would however make rotation slower
   879  and potentially make integration with third-party services more difficult. User-provided keys would have to be
   880  available until no data uses them.
   881  
   882  ### Relationship between store and data keys
   883  
   884  **We have settled on tied cipher/key-size specification. This can be changed easily.**
   885  
   886  The current proposal uses the same cipher and key size for store and data keys.
   887  
   888  Pros:
   889  * more user friendly: only have to specify one cipher
   890  * less chance of mistake when switching encryption on/off
   891  
   892  Cons:
   893  * it's not possible to specify a different cipher for store keys
   894  
   895  ### Directly using the data prefix format
   896  
   897  The previous version of this RFC proposed using the `rocksdb::EncryptedEnv` for all files, with encryption state
   898  (plaintext or encrypted) and encryption fields stored in the 4KiB data prefix.
   899  
   900  The main issues of that solution are:
   901  * cannot switch existing stores to the data prefix format, requiring new stores for encryption support
   902  * overhead of the encrypted env for plaintext files
   903  * lack of support for multiple keys in the existing data prefix format requiring heaving modification
   904  
   905  ## Future improvements
   906  
   907  We break down future improvements in multiple categories:
   908  * v1.0: may be not done as part of the initial implementation. Must be done for the first
   909  stable release.
   910  * future: possible additions to come after first stable release.
   911  
   912  The features are listed in no particular order.
   913  
   914  ### v1.0: a.k.a. MVP
   915  
   916  #### Instruction set support
   917  
   918  Crypto++ can determine support for SSE2 and AES-NI at runtime and fall back to software implementation when
   919  not supported.
   920  
   921  There are a few things we can do:
   922  * ensure out builds properly enable instruction-set detection
   923  * surface a warning when running in software mode
   924  * properly document instruction set requirements for optimal performance
   925  
   926  #### Forcing re-encryption
   927  
   928  We need to find a way to force re-encryption when we want to remove an old key.
   929  While rocksdb regularly creates new files, we may need to force rewrite for less-frequently
   930  updated files. Other files (such as `MANIFEST`, `OPTIONS`, `CURRENT`, `IDENTITY`, etc...) may need
   931  a different method to rewrite.
   932  
   933  Compaction (of the entire key space, or specific ranges determined through live file metadata) may provide
   934  the bulk of the needed functionality.
   935  However, some files (especially with no updates) will not be rewritten.
   936  
   937  Some possible solutions to investigate:
   938  * there is rumor of being able to mark sstables as "dirty"
   939  * patches to rocksdb to force rotation even if nothing has changed (may be the safest)
   940  * "poking" at the files to add changes (may be impossible to do properly)
   941  * level of indirection in the encryption layer while a file is being rewritten
   942  
   943  Part of forcing re-encryption includes:
   944  * when to do it automatically (eg: age-based. maybe after half the active key lifetime)
   945  * how to do it manually (user requests quick re-encryption)
   946  * specifying what to re-encrypt (eg: all data keys up to ID 5)
   947  
   948  #### Garbage collection of old data keys
   949  
   950  We would prefer not to keep old data keys forever, but we need to be certain that a key is no longer in use
   951  before deleting it. How feasible this is depends on the accuracy of our encryption status reporting.
   952  
   953  If we choose to ignore non-live files, garbage collection should be reasonably safe.
   954  
   955  #### Garbage collection of registry entries
   956  
   957  All encrypted files are stored in the registry. Live rocksdb files will automatically be removed as they are
   958  deleted, but any other files will remain forever if not deleted through rocksdb.
   959  
   960  We may want to periodically stats all files in our registry and deleted the entries for nonexistent files.
   961  
   962  #### Performance impact
   963  
   964  The performance impact needs to be measured for a variety of workloads and for all supported ciphers.
   965  This is needed to provide some guidance to users.
   966  
   967  Guidance on key rotation period would also be helpful. This is dependent on the rocksdb churn, so will depend
   968  on the specific workload. We may want to add metrics about data churn to our encryption status reporting.
   969  
   970  #### Propagating encrypted status
   971  
   972  We may want to automatically mark a store as "encrypted" and make this status available to zone configuration,
   973  allowing database/table placement to specify encryption status.
   974  
   975  When to mark a store as "encrypted" is not clear. For example: can we mark it as encrypted just because encryption
   976  is enabled, or should we wait until encryption usage is at 100%?
   977  
   978  If we use the existing store attributes for this marker, we may need to add the concept of "reserved" attributes.
   979  
   980  #### Encryption-related metrics
   981  
   982  We can export high-level metrics about at-rest-encryption through prometheus.
   983  This can include:
   984  * encryption status (enabled/disabled/not-possible-on-this-store)
   985  * amount of encrypted data per key ID
   986  * amount of data per cipher (or plaintext)
   987  * age of in-use keys
   988  
   989  #### Reloading store keys
   990  
   991  The current proposal only reloads store keys at node start time.
   992  We can avoid restarts by triggering a refresh of the store key file when receiving a signal (eg: `SIGHUP`) or other
   993  conditions (periodic refresh, admin UI endpoint, filesystem polling, etc...)
   994  
   995  #### Tooling
   996  
   997  At the very least, we want `cockroach debug` tools to continue working correctly with encrypted files.
   998  
   999  We should examine which rocksdb-provided tools may need modification as well, possibly involving patches
  1000  to rocksdb.
  1001  
  1002  ### Possible future additions
  1003  
  1004  #### Safe file deletion
  1005  
  1006  We may want to delete old files in a less recoverable way (some filesystems allow un-delete).
  1007  On SSDs, a single overwrite pass may be sufficient. We do not propose to handle safe deletion
  1008  on hard drives.
  1009  
  1010  #### Support for additional block ciphers
  1011  
  1012  Crypto++ supports multiple block ciphers. It should be reasonably easy to add support for
  1013  other ciphers.
  1014  
  1015  #### Authenticated encryption for data integrity
  1016  
  1017  We can switch to authenticated encryption (eg: Galois Counter Mode, or others) to allow integrity
  1018  verification of files on disk.
  1019  
  1020  Implementing authenticated encryption would require additional changes to the raw storage format
  1021  to store the final authentication tag.
  1022  
  1023  #### Add sanity checks
  1024  
  1025  We could perform a few checks to ensure data security, such as:
  1026  * detect if keys are on the same disk as the store
  1027  * detect if keys have loose permissions
  1028  * detect if swap is enabled
  1029  
  1030  #### Enterprise feature gating
  1031  
  1032  The current proposal does not gate encryption on a valid license due to the fact that we cannot check the license
  1033  when initialising the node.
  1034  
  1035  A possible solution to explore is detection when the node joins a cluster. eg:
  1036  * always allow store encryption
  1037  * when a node joins, communicate its encryption status and refuse the join if no enterprise license exists
  1038  * on bootstrap, an encrypted store will only allow SQL operations on the system tables (to set the license)
  1039  * the license can be passed through `init`
  1040  
  1041  This would still cause issues when removing the license (or errors loading/validating the license).
  1042  
  1043  Less drastic actions may be possible.