github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/quickstart/query.md

github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/quickstart/query.md (about)

     1  ---
     2  title: 2️⃣ Query the data
     3  description: lakeFS quickstart / Query the pre-populated data using DuckDB browser that's built into lakeFS
     4  parent: ⭐ Quickstart
     5  nav_order: 10
     6  next: ["Create a branch of the data", "./branch.html"]
     7  previous: ["Launch the quickstart environment", "./launch.html"]
     8  ---
     9  
    10  # Let's Query Something 
    11  
    12  The lakeFS server has been loaded with a sample parquet datafile. Fittingly enough for a piece of software to help users of data lakes, the `lakes.parquet` file holds data about lakes around the world. 
    13  
    14  You'll notice that the branch is set to `main`. This is conceptually the same as your main branch in Git against which you develop software code. 
    15  
    16  <img src="{{ site.baseurl }}/assets/img/quickstart/repo-contents.png" alt="The lakeFS objects list with a highlight to indicate that the branch is set to main." class="quickstart"/>
    17  
    18  Let's have a look at the data, ahead of making some changes to it on a branch in the following steps. 
    19  
    20  Click on `lakes.parquet` and notice that the built-it DuckDB runs a query to show a preview of the file's contents. 
    21  
    22  <img src="{{ site.baseurl }}/assets/img/quickstart/duckdb-main-01.png" alt="The lakeFS object viewer with embedded DuckDB to query parquet files. A query has run automagically to preview the contents of the selected parquet file." class="quickstart"/>
    23  
    24  Now we'll run our own query on it to look at the top five countries represented in the data. 
    25  
    26  Copy and paste the following SQL statement into the DuckDB query panel and click on Execute.
    27  
    28  ```sql
    29  SELECT   country, COUNT(*)
    30  FROM     READ_PARQUET('lakefs://quickstart/main/lakes.parquet')
    31  GROUP BY country
    32  ORDER BY COUNT(*) 
    33  DESC LIMIT 5;
    34  ```
    35  
    36  <img src="{{ site.baseurl }}/assets/img/quickstart/duckdb-main-02.png" alt="An embedded DuckDB query showing a count of rows per country in the dataset." class="quickstart"/>
    37  
    38  Next we're going to make some changes to the data—but on a development branch so that the data in the main branch remains untouched.