github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/quickstart/query.md (about) 1 --- 2 title: 2️⃣ Query the data 3 description: lakeFS quickstart / Query the pre-populated data using DuckDB browser that's built into lakeFS 4 parent: ⭐ Quickstart 5 nav_order: 10 6 next: ["Create a branch of the data", "./branch.html"] 7 previous: ["Launch the quickstart environment", "./launch.html"] 8 --- 9 10 # Let's Query Something 11 12 The lakeFS server has been loaded with a sample parquet datafile. Fittingly enough for a piece of software to help users of data lakes, the `lakes.parquet` file holds data about lakes around the world. 13 14 You'll notice that the branch is set to `main`. This is conceptually the same as your main branch in Git against which you develop software code. 15 16 <img src="{{ site.baseurl }}/assets/img/quickstart/repo-contents.png" alt="The lakeFS objects list with a highlight to indicate that the branch is set to main." class="quickstart"/> 17 18 Let's have a look at the data, ahead of making some changes to it on a branch in the following steps. 19 20 Click on `lakes.parquet` and notice that the built-it DuckDB runs a query to show a preview of the file's contents. 21 22 <img src="{{ site.baseurl }}/assets/img/quickstart/duckdb-main-01.png" alt="The lakeFS object viewer with embedded DuckDB to query parquet files. A query has run automagically to preview the contents of the selected parquet file." class="quickstart"/> 23 24 Now we'll run our own query on it to look at the top five countries represented in the data. 25 26 Copy and paste the following SQL statement into the DuckDB query panel and click on Execute. 27 28 ```sql 29 SELECT country, COUNT(*) 30 FROM READ_PARQUET('lakefs://quickstart/main/lakes.parquet') 31 GROUP BY country 32 ORDER BY COUNT(*) 33 DESC LIMIT 5; 34 ``` 35 36 <img src="{{ site.baseurl }}/assets/img/quickstart/duckdb-main-02.png" alt="An embedded DuckDB query showing a count of rows per country in the dataset." class="quickstart"/> 37 38 Next we're going to make some changes to the data—but on a development branch so that the data in the main branch remains untouched.