github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/integrations/presto_trino.md (about) 1 --- 2 title: Presto / Trino 3 description: This section explains how you can start using lakeFS with Presto and Trino, an open-source distributed SQL query engine. 4 parent: Integrations 5 redirect_from: 6 - /integrations/presto.html 7 - /using/presto.html 8 --- 9 10 # Using lakeFS with Presto/Trino 11 12 [Presto](https://prestodb.io){:target="_blank"} and [Trino](https://trinodb.io){:target="_blank"} are a distributed SQL query engines designed to query large data sets distributed over one or more heterogeneous data sources. 13 14 15 Querying data in lakeFS from Presto/Trino is similar to querying data in S3 from Presto/Trino. It is done using the [Presto Hive connector](https://prestodb.io/docs/current/connector/hive.html){:target="_blank"} or [Trino Hive connector](https://trino.io/docs/current/connector/hive.html){:target="_blank"}. 16 17 {% include toc_2-3.html %} 18 19 {: .note-title} 20 > Credentials 21 > 22 > In the following examples, we set AWS credentials at runtime for clarity. In production, these properties should be set using one of Hadoop's standard ways of [Authenticating with S3](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3){:target="_blank"}. 23 24 ## Configuration 25 26 ### Configure the Hive connector 27 28 Create `/etc/catalog/hive.properties` with the following contents to mount the `hive-hadoop2` connector as the Hive catalog, replacing `example.net:9083` with the correct host and port for your Hive Metastore Thrift service: 29 ```properties 30 connector.name=hive-hadoop2 31 hive.metastore.uri=thrift://example.net:9083 32 ``` 33 34 Add the lakeFS configurations to `/etc/catalog/hive.properties` in the corresponding S3 configuration properties: 35 ```properties 36 hive.s3.aws-access-key=AKIAIOSFODNN7EXAMPLE 37 hive.s3.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY 38 hive.s3.endpoint=https://lakefs.example.com 39 hive.s3.path-style-access=true 40 ``` 41 42 ### Configure Hive 43 44 Presto/Trino uses Hive Metastore Service (HMS) or a compatible implementation of the Hive Metastore such as AWS Glue Data Catalog to write data to S3. 45 In case you are using Hive Metastore, you will need to configure Hive as well. 46 47 In file `hive-site.xml` add to the configuration: 48 ```xml 49 <configuration> 50 ... 51 <property> 52 <name>fs.s3a.access.key</name> 53 <value>AKIAIOSFODNN7EXAMPLE</value></property> 54 <property> 55 <name>fs.s3a.secret.key</name> 56 <value>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</value> 57 </property> 58 <property> 59 <name>fs.s3a.endpoint</name> 60 <value>https://lakefs.example.com</value> 61 </property> 62 <property> 63 <name>fs.s3a.path.style.access</name> 64 <value>true</value> 65 </property> 66 </configuration> 67 ``` 68 69 70 ## Examples 71 72 Here are some examples based on examples from the [Presto Hive connector examples](https://prestodb.io/docs/current/connector/hive.html#examples){:target="_blank"} and [Trino Hive connector examples](https://trino.io/docs/current/connector/hive.html#examples){:target="_blank"} 73 74 ### Example with schema 75 76 Create a new schema named `main` that will store tables in a lakeFS repository named `example` branch: `master`: 77 ```sql 78 CREATE SCHEMA main 79 WITH (location = 's3a://example/main') 80 ``` 81 82 Create a new Hive table named `page_views` in the `web` schema stored using the ORC file format, 83 partitioned by date and country, and bucketed by user into `50` buckets (note that Hive requires the partition columns to be the last columns in the table): 84 ```sql 85 CREATE TABLE main.page_views ( 86 view_time timestamp, 87 user_id bigint, 88 page_url varchar, 89 ds date, 90 country varchar 91 ) 92 WITH ( 93 format = 'ORC', 94 partitioned_by = ARRAY['ds', 'country'], 95 bucketed_by = ARRAY['user_id'], 96 bucket_count = 50 97 ) 98 ``` 99 100 ### Example with External table 101 102 Create an external Hive table named `request_logs` that points at existing data in lakeFS: 103 104 ```sql 105 CREATE TABLE main.request_logs ( 106 request_time timestamp, 107 url varchar, 108 ip varchar, 109 user_agent varchar 110 ) 111 WITH ( 112 format = 'TEXTFILE', 113 external_location = 's3a://example/main/data/logs/' 114 ) 115 ``` 116 117 ### Example of copying a table with [metastore tools](glue_hive_metastore.md): 118 119 {: .warning } 120 **Deprecated Feature:** Having heard the feedback from the community, we are planning to replace the below manual steps with an automated process. 121 You can read more about it [here](https://github.com/treeverse/lakeFS/issues/6461). 122 123 Copy the created table `page_views` on schema `main` to schema `example_branch` with location `s3a://example/example_branch/page_views/` 124 ```shell 125 lakectl metastore copy --from-schema main --from-table page_views --to-branch example_branch 126 ```