go.etcd.io/etcd@v3.3.27+incompatible/Documentation/benchmarks/etcd-storage-memory-benchmark.md (about)

     1  ---
     2  title: Storage Memory Usage Benchmark
     3  ---
     4  
     5  <!---todo: link storage to storage design doc-->
     6  Two components of etcd storage consume physical memory. The etcd process allocates an *in-memory index* to speed key lookup. The process's *page cache*, managed by the operating system, stores recently-accessed data from disk for quick re-use.
     7  
     8  The in-memory index holds all the keys in a [B-tree][btree] data structure, along with pointers to the on-disk data (the values). Each key in the B-tree may contain multiple pointers, pointing to different versions of its values. The theoretical memory consumption of the in-memory index can hence be approximated with the formula:
     9  
    10  `N * (c1 + avg_key_size) + N * (avg_versions_of_key) * (c2 + size_of_pointer)`
    11  
    12  where `c1` is the key metadata overhead and `c2` is the version metadata overhead.
    13  
    14  The graph shows the detailed structure of the in-memory index B-tree.
    15  
    16  ```
    17  
    18  
    19                                  In mem index
    20  
    21                                 +------------+
    22                                 | key || ... |
    23    +--------------+             |     ||     |
    24    |              |             +------------+
    25    |              |             | v1  || ... |
    26    |   disk    <----------------|     ||     | Tree Node
    27    |              |             +------------+
    28    |              |             | v2  || ... |
    29    |           <----------------+     ||     |
    30    |              |             +------------+
    31    +--------------+       +-----+    |   |   |
    32                           |     |    |   |   |
    33                           |     +------------+
    34                           |
    35                           |
    36                           ^
    37                        ------+
    38                        | ... |
    39                        |     |
    40                        +-----+
    41                        | ... | Tree Node
    42                        |     |
    43                        +-----+
    44                        | ... |
    45                        |     |
    46                        ------+
    47  ```
    48  
    49  [Page cache memory][pagecache] is managed by the operating system and is not covered in detail in this document.
    50  
    51  ## Testing Environment
    52  
    53  etcd version
    54  - git head https://github.com/coreos/etcd/commit/776e9fb7be7eee5e6b58ab977c8887b4fe4d48db
    55  
    56  GCE n1-standard-2 machine type
    57  
    58  - 7.5 GB memory
    59  - 2x CPUs
    60  
    61  ## In-memory index memory usage
    62  
    63  In this test, we only benchmark the memory usage of the in-memory index. The goal is to find `c1` and `c2` mentioned above and to understand the hard limit of memory consumption of the storage.
    64  
    65  We calculate the memory usage consumption via the Go runtime.ReadMemStats. We calculate the total allocated bytes difference before creating the index and after creating the index. It cannot perfectly reflect the memory usage of the in-memory index itself but can show the rough consumption pattern. 
    66  
    67  | N    | versions | key size | memory usage |
    68  |------|----------|----------|--------------|
    69  | 100K | 1        | 64bytes  | 22MB         |
    70  | 100K | 5        | 64bytes  | 39MB         |
    71  | 1M   | 1        | 64bytes  | 218MB        |
    72  | 1M   | 5        | 64bytes  | 432MB        |
    73  | 100K | 1        | 256bytes | 41MB         |
    74  | 100K | 5        | 256bytes | 65MB         |
    75  | 1M   | 1        | 256bytes | 409MB        |
    76  | 1M   | 5        | 256bytes | 506MB        |
    77  
    78  
    79  Based on the result, we can calculate `c1=120bytes`, `c2=30bytes`. We only need two sets of data to calculate `c1` and `c2`, since they are the only unknown variable in the formula. The `c1=120bytes` and `c2=30bytes` are the average value of the 4 sets of `c1` and `c2` we calculated. The key metadata overhead is still relatively nontrivial (50%) for small key-value pairs. However, this is a significant improvement over the old store, which had at least 1000% overhead.
    80  
    81  ## Overall memory usage
    82  
    83  The overall memory usage captures how much RSS etcd consumes with the storage. The value size should have very little impact on the overall memory usage of etcd, since we keep values on disk and only retain hot values in memory, managed by the OS page cache.
    84  
    85  | N    | versions | key size | value size | memory usage |
    86  |------|----------|----------|------------|--------------|
    87  | 100K | 1        | 64bytes  | 256bytes   | 40MB         |
    88  | 100K | 5        | 64bytes  | 256bytes   | 89MB         |
    89  | 1M   | 1        | 64bytes  | 256bytes   | 470MB        |
    90  | 1M   | 5        | 64bytes  | 256bytes   | 880MB        |
    91  | 100K | 1        | 64bytes  | 1KB        | 102MB        |
    92  | 100K | 5        | 64bytes  | 1KB        | 164MB        |
    93  | 1M   | 1        | 64bytes  | 1KB        | 587MB        |
    94  | 1M   | 5        | 64bytes  | 1KB        | 836MB        |
    95  
    96  Based on the result, we know the value size does not significantly impact the memory consumption. There is some minor increase due to more data held in the OS page cache.
    97  
    98  [btree]: https://en.wikipedia.org/wiki/B-tree
    99  [pagecache]: https://en.wikipedia.org/wiki/Page_cache
   100