github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/integrations/hive.md (about)

     1  ---
     2  title: Apache Hive
     3  description: This section covers how you can start using lakeFS with Apache Hive, a distributed data warehouse system that enables analytics at a massive scale.
     4  parent: Integrations
     5  redirect_from: /using/hive.html
     6  ---
     7  
     8  # Using lakeFS with Apache Hive
     9  
    10  The [Apache Hive ™](https://hive.apache.org/) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
    11  
    12  {% include toc.html %}
    13  
    14  ## Configuration
    15  To configure Hive to work with lakeFS, you need to set the lakeFS credentials in the corresponding S3 credential fields.
    16      
    17  lakeFS endpoint: ```fs.s3a.endpoint``` 
    18  
    19  lakeFS access key: ```fs.s3a.access.key```
    20  
    21  lakeFS secret key: ```fs.s3a.secret.key```
    22  
    23   **Note** 
    24  In the following examples, we set AWS credentials at runtime for clarity. In production, these properties should be set using one of Hadoop's standard ways of [Authenticating with S3](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3){:target="_blank"}. 
    25   {: .note}
    26   
    27  For example, you can add the configurations to the file ``` hdfs-site.xml```:
    28  ```xml
    29  <configuration>
    30      ...
    31      <property>
    32          <name>fs.s3a.secret.key</name>
    33          <value>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</value>
    34      </property>
    35      <property>
    36          <name>fs.s3a.access.key</name>
    37          <value>AKIAIOSFODNN7EXAMPLE</value>
    38      </property>
    39      <property>
    40          <name>fs.s3a.endpoint</name>
    41          <value>https://lakefs.example.com</value>
    42      </property>
    43      <property>
    44         <name>fs.s3a.path.style.access</name>
    45         <value>true</value>
    46      </property>
    47  </configuration>
    48  ```
    49  
    50  **Note**
    51  In this example, we set `fs.s3a.path.style.access` to true to remove the need for additional DNS records for [virtual hosting](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html)
    52  `fs.s3a.path.style.access` that was introduced in Hadoop 2.8.0
    53  {: .note}
    54  
    55  ## Examples
    56  
    57  ### Example with schema
    58  
    59  ```hql
    60  CREATE  SCHEMA example LOCATION 's3a://example/main/' ;
    61  CREATE TABLE example.request_logs (
    62      request_time timestamp,
    63      url string,
    64      ip string,
    65      user_agent string
    66  );
    67  ```
    68  ### Example with an external table
    69  
    70  ```hql
    71  CREATE EXTERNAL TABLE request_logs (
    72      request_time timestamp,
    73      url string,
    74      ip string,
    75      user_agent string
    76  ) LOCATION 's3a://example/main/request_logs' ;
    77  ```
    78  
    79  
    80  
    81