github.com/thanos-io/thanos@v0.32.5/docs/blog/2023-06-02-lfx-mentorship-query-observability.md (about)

     1  ---
     2  title: 'LFX Mentorship: Thanos Query Observability'
     3  date: "2023-06-06"
     4  author: Pradyumna Krishna
     5  ---
     6  
     7  Hello everyone, my name is Pradyumna Krishna and I was contributing to Thanos as a part of the LFX mentorship programme. I was working on adding Query Execution Observability to the Thanos PromQL engine and Thanos project itself. I am going to share my experience and provide insights into the project.
     8  
     9  ## The Project
    10  
    11  ### What's PromQL?
    12  
    13  Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real-time.
    14  
    15  ### What's the Thanos PromQL Engine?
    16  
    17  Thanos PromQL engine is another engine that's currently under development and aims to be fully compatible with the original PromQL engine. It is a multi-threaded implementation of a PromQL Query Engine based on the [Volcano/Iterator model](https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf). Thanos PromQL engine focuses on adding more performance and features like query observability that is missing from the original promql-engine.
    18  
    19  PromQL engine supports many binary and aggregation [operators](https://prometheus.io/docs/prometheus/latest/querying/operators/). These include basic logical and arithmetic operators and also built-in aggregation operators that can be used to aggregate the elements of a single instant vector, resulting in a new vector of fewer elements with aggregated values. The result of the query formed using these operators gets visualized as a graph or viewed as tabular data.
    20  
    21  ### Query Observability
    22  
    23  Query observability provides the ability to see internal metadata such as the duration, query plan, and detailed stats about each query (and different parts of it) in the Thanos PromQL engine. By exploring query stats and plans, one can try to optimize the query executed by the engine.
    24  
    25  ## Contributions
    26  
    27  Over this period I worked on two major features which are switching engines in the Thanos via the UI to expose it to more users and facilitate exploration, and adding query explanation functionality to the UI.
    28  
    29  Giedrius always came up with sketches showing me how the feature should work and look.
    30  
    31  ### Engine Switch
    32  
    33  Thanos needs to specify which engine to utilize to execute the promql queries. A query can only be executed by one engine. Originally, switching between different engines was implemented through a command line flag. My first feature to implement was to add this engine switching to Thanos, switching the engine from Prometheus to Thanos using UI without any interruptions. Now the engine specified through the command line specifies the default engine to use if no promql engine parameter has been set through the API.
    34  
    35  Here is the sketch of how the engine switch should look (made by Giedrius):
    36  
    37  ![Giedrius's Sketch](img/giedrius-sketch.png)
    38  
    39  I looked into Thanos and made changes in Thanos components from the API to the UI to implement this engine switch. I tried to make the UI as close as possible to the sketch.
    40  
    41  ![Engine Switch 1](img/engine-1.png)
    42  
    43  After initial review, I move this switch to each panel. As this helps to run the same query over the different engines and compare the results.
    44  
    45  ![Engine Switch 2](img/engine-2.png)
    46  
    47  **Pull Requests**
    48  - Add engine param to Thanos engine constructor ([thanos-io/promql-engine#228](https://github.com/thanos-io/promql-engine/pull/228))
    49  - Query: Switch Multiple Engines ([thanos-io/thanos#6234](https://github.com/thanos-io/thanos/pull/6234))
    50  
    51  ### Query Explanation
    52  
    53  Query explanation is a dry run of the query planning and returns the query execution plans. Query explanation can be enabled through a flag on the Thanos PromQL engine. After that query explanation is visualized through the Thanos query UI. That is a feature of Query Observability. For the convenience of the user, the Explain checkbox is locked unless the Thanos PromQL engine is used
    54  
    55  I started with generating query explanations through the Thanos promql-engine and then worked on the Thanos UI to represent the query explanation in a tree-like structure.
    56  
    57  Thanos UI got updated with time, resulting in the development of the explanation tree that can be seen through the screenshots provided.
    58  
    59  ![Tree 1](img/tree-1.png)
    60  
    61  ![Tree 2](img/tree-2.png)
    62  
    63  ![Tree 3](img/tree-3.png)
    64  
    65  **Pull Requests**
    66  - Add method to explain query vector ([thanos-io/promql-engine#252](https://github.com/thanos-io/promql-engine/pull/252))
    67  - Query Explanation ([thanos-io/thanos#6346](https://github.com/thanos-io/thanos/pull/6346))
    68  
    69  ### What next
    70  
    71  We have laid the foundation of Query Observability by implementing the Query Explanation feature. The next steps forward can be:
    72  * Showing memory allocations, CPU usage
    73  * Showing "hottest" operators
    74  * Showing metadata about leaf nodes (how many postings/series were fetched/touched, etc.)
    75  
    76  ## Overall Experience
    77  
    78  I love the LFX mentorship program. I learned a lot while contributing, and gained new skills in Golang and typescript. My mentors Giedrius and Saswata guided me throughout the program, they never let me get stuck at any point and always provided me with technical as well as career guidance.