github.com/pyroscope-io/pyroscope@v0.37.3-0.20230725203016-5f6947968bd0/docs/storage-design.md

github.com/pyroscope-io/pyroscope@v0.37.3-0.20230725203016-5f6947968bd0/docs/storage-design.md (about)

     1  ![image](https://user-images.githubusercontent.com/23323466/110414341-8ad0c000-8044-11eb-9628-7b24e50295b2.png)
     2  
     3  # O(log n) makes continuous profiling possible
     4  
     5  #### _Read this in other languages._
     6  <kbd>[<img title="中文 (Simplified)" alt="中文 (Simplified)" src="https://cdn.staticaly.com/gh/hjnilsson/country-flags/master/svg/cn.svg" width="22">](storage-design-ch.md)</kbd>
     7  
     8  Pyroscope is software that lets you **continuously** profile your code to debug performance issues down to a line of code. With just a few lines of code it will do the following:
     9  
    10  ### Pyroscope Agent
    11  - Polls the stack trace every 0.01 seconds to see which functions are consuming resources
    12  - Batches that data into 10s blocks and sends it to Pyroscope server
    13  
    14  ### Pyroscope Server
    15  - Receives data from the Pyroscope agent and processes it to be stored efficiently
    16  - Pre-aggregates profiling data for fast querying when data needs to be retrieved
    17  
    18  ## Storage Efficiency
    19  
    20  The challenge with continuous profiling is that if you just take frequent chunks of profiling data, compress it, and store it somewhere, it becomes:
    21  1. Too much data to store efficiently
    22  2. Too much data to query quickly
    23  
    24  We solve these problems by:
    25  1. Using a combination of tries and trees to compress data efficiently
    26  2. Using segment trees to return queries for any timespan of data in O(log n) vs O(n) time complexity
    27  
    28  ## Step 1: Turning the profiling data into a tree
    29  
    30  The simplest way to represent profiling data is in a list of string each one representing a stack trace and a number of times this particular stack trace was seen during a profiling session:
    31  
    32  ```bash
    33  server.py;fast_function;work 2
    34  server.py;slow_function;work 8
    35  ```
    36  
    37  The first obvious thing we do is we turn this data into a tree. Conveniently, this representation also makes it easy to later generate flamegraphs.
    38  
    39  ![raw_vs_flame_graph](https://user-images.githubusercontent.com/23323466/110378930-0f065180-800b-11eb-9357-71724bc7258c.gif)
    40  
    41  Compressing the stack traces into trees saves space on repeated elements. By using trees, we go from having to store common paths like `net/http.request` in the db multiple times to only having to store it 1 time and saving a reference to the location at which it's located. This is fairly standard with profiling libraries since its the lowest hanging fruit when it comes to optimizing storage with profiling data.
    42  
    43  ![fast-compress-stack-traces](https://user-images.githubusercontent.com/23323466/110227218-e109fb80-7eaa-11eb-81a8-cdf2b3944f1c.gif)
    44  
    45  ## Step 2: Adding tries to store individual symbols more efficiently
    46  
    47  So now that we've compressed the raw profiling data by converting into a tree, many of the nodes in this compressed tree contain symbols that also share repeated elements with other nodes. For example:
    48  
    49  ```
    50  net/http.request;net/io.read 100 samples
    51  net/http.request;net/io.write 200 samples
    52  ```
    53  
    54  While the `net/http.request`, `net/io.read`, and `net/io.write` functions differ they share the same common ancestor of `net/`.
    55  
    56  Each of these lines can be serialized using a prefix tree as follows. This means that instead of storing the same prefixes multiple times, we can now just store them once in a trie and access them by storing a pointer to their position in memory:
    57  
    58  ![storage-design-0](https://user-images.githubusercontent.com/23323466/110520399-446e7600-80c3-11eb-84e9-ecac7c0dbf23.gif)
    59  
    60  In this basic example we save ~80% of space going from 39 bytes to 8 bytes. Typically, symbol names are much longer and as the number of symbols grows, storage requirements grow logarithmically rather than linearly.
    61  
    62  ## Step 1 + 2: Combining the trees with the tries
    63  
    64  In the end, by using a tree to compress the raw profiling data and then using tries to compress the symbols we get the following storage amounts for our simple example:
    65  
    66  ```
    67  | data type           | bytes |
    68  |---------------------|-------|
    69  | raw data            | 93    |
    70  | tree                | 58    |
    71  | tree + trie         | 10    |
    72  ```
    73  
    74  As you can see this is a 9x improvement for a fairly trivial case. In real world scenarios the compression factor gets much larger.
    75  
    76  ![combine-segment-and-prefix_1](https://user-images.githubusercontent.com/23323466/110262208-ca75aa00-7f67-11eb-8f16-0572a4641ee1.gif)
    77  
    78  ## Step 3: Optimizing for fast reads using Segment Trees
    79  
    80  Now that we have a way of storing the data efficiently the next problem that arises is how do we query it efficiently. The way we solve this problem is by pre-aggregating the profiling data and storing it in a special segment tree.
    81  
    82  Every 10s Pyroscope agent sends a chunk of profiling data to the server which writes the data into the db with the corresponding timestamp. You'll notice that each write happens once, but is replicated multiple times.
    83  
    84  **Each layer represents a time block of larger units so in this case for every two 10s time blocks, one 20s time block is created. This is to make reading the data more efficient (more on that in a second)**.
    85  
    86  ![segment_tree_animation_1](https://user-images.githubusercontent.com/23323466/110259555-196a1200-7f5d-11eb-9223-218bb4b34c6b.gif)
    87  
    88  ## Turn reads from O(n) to O(log n)
    89  
    90  If you don't use segment trees and just write data in 10 second chunks the time complexity for the reads becomes a function of how many 10s units the query asks for. If you want 1 year of data, you'll have to then merge 3,154,000 trees representing the profiling data. By using segment trees you can effectively decrease the amount of merge operations from O(n) to O(log n).
    91  
    92  ![segment_tree_reads](https://user-images.githubusercontent.com/23323466/110277713-b98a6000-7f8a-11eb-942f-3a924a6e0b09.gif)
    93  
    94  
    95  ## Help us add more profilers
    96  
    97  We spent a lot of time on solving this storage / querying problem because we wanted to make software that can do truly continuous profiling in production without causing too much overhead.
    98  
    99  While Pyroscope currently supports 4 languages, we would love to add more.
   100  
   101  Any sampling profiler that can export data in the "raw" format linked above can become a Profiling agent with Pyroscope. We'd love your help building out profilers for other languages!
   102  
   103  - [x] [Go](https://pyroscope.io/docs/golang)
   104  - [x] [Python](https://pyroscope.io/docs/python)
   105  - [x] [eBPF](https://pyroscope.io/docs/ebpf)
   106  - [x] [Ruby](https://pyroscope.io/docs/ruby)
   107  - [x] [PHP](https://pyroscope.io/docs/php)
   108  - [x] [Java](https://pyroscope.io/docs/java)
   109  - [x] [.NET](https://pyroscope.io/docs/dotnet)
   110  - [ ] [Rust](https://github.com/pyroscope-io/pyroscope/issues/83#issuecomment-784947654)
   111  - [ ] [Node](https://github.com/pyroscope-io/pyroscope/issues/8)
   112  
   113  If you want to help contribute or need help setting up Pyroscope here's how you can reach us:
   114  - Join our [Slack](https://pyroscope.io/slack)
   115  - Set up a time to meet with us [here](https://pyroscope.io/setup-call)
   116  - Write an [issue](https://github.com/pyroscope-io/pyroscope/issues)
   117  - Follow us on [Twitter](https://twitter.com/PyroscopeIO)