github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/unity-delta-sharing.md (about)

     1  ---
     2  title: Unity Delta Sharing
     3  parent: lakeFS Cloud
     4  description: The lakeFS Delta Sharing service lets you export DeltaLake and HMS-style tables stored on lakeFS over the Delta Sharing protocol. This is particularly useful with DataBricks Unity.
     5  redirect_from:
     6    - /cloud/unity-delta-sharing.html
     7    - /cloud/unity-delta-sharing-m0-users
     8  ---
     9  
    10  # Unity Delta Sharing
    11  {: .d-inline-block }
    12  lakeFS Cloud
    13  {: .label .label-green }
    14  
    15  
    16  {: .warning }
    17  > Please note, as of June 15th 2024, the Unity Delta Sharing feature will be removed. To integrate lakeFS with Unity Catalog, refer to the [Unity integration](../integrations/unity-catalog.md) docs.
    18  
    19  ## Introduction
    20  
    21  lakeFS Unity Delta Sharing provides a read-only experience from Unity Catalog for lakeFS customers.  Currently, this is available as a private
    22  preview.  It provides _full read-only functionality_ for Unity Catalog.  It does _not_ provide a "self-service" experience to set up the service.
    23  
    24  ## Setup
    25  
    26  This guide explains how to set up and use lakeFS Delta Sharing.  Currently, you will have to configure lakeFS Delta Sharing in collaboration with
    27  Treeverse Customer Success.  Once set up is complete, you will of course be able to use lakeFS Delta Sharing on existing and on new tables without
    28  further assistance.
    29  
    30  ### 1. Collect data and initial setup
    31  
    32  * Select a `Delta Sharing configuration URL`.  This is a single location on lakeFS to hold the top-level configuration of lakeFS Delta Sharing across all
    33    repositories of your organization.  Typically, it will have the form `lakefs://REPO/main/lakefs_delta_sharing.yaml` for one of your
    34    repositories.  A longer path may be supplied - however, we do recommend keeping it on the `main` branch, as this object represents state
    35    for the entire installation.
    36  
    37  * Create a user `lakefs-delta-sharing-service` for lakeFS Delta Sharing, and an access key for that user.  It should have at least read permissions for
    38    the configuration URL and for all repositories and all data accesses by Unity.  lakeFS Delta Sharing will these credentials to communicate with lakeFS.
    39  
    40  **Communicate these items to Customer Success**:
    41  
    42  * Configuration URL
    43  * Access key ID and secret access key for user `lakefs-delta-sharing-service`.
    44  
    45  Note: All YAML files extensions used in this guide must be `yaml`. Do not use a `yml` extension instead.
    46  {: .note }
    47  
    48  ### 2. Initial configuration
    49  
    50  Select a secret authorization token to share Unity catalog.  Unity catalog will use to authenticate to the lakeFS Delta Sharing server.  
    51  You might use this command on Linux:
    52  
    53  ```sh
    54  head -c 50 /dev/random | base64
    55  ```
    56  
    57  Create a file `lakefs_delta_sharing.yaml` and place it at the config URL selected above.  It should look like this:
    58  
    59  ```yaml
    60  authorization_token: "GENERATED TOKEN"
    61  # Map lakeFS repositories to Unity shares
    62  repositories:
    63      - id: sample-repo
    64        share_name: undev
    65        # List branches and prefixes to export.  Each of these branches (and only
    66        # these branches) will be available as a schema on Unity.
    67        branches:
    68            - main
    69            - staging
    70            - dev_*
    71      - id: repo2
    72        share_name: share_two
    73        branches:
    74        - "*"
    75  ```
    76  
    77  Note that a plain "*" line must be quoted in YAML.
    78  
    79  Upload it to your config URL.  For instance if the config URL is `lakefs://repo/main/lakefs_delta_sharing.yaml`, you might use:
    80  
    81  ```sh
    82  lakectl fs upload -s ./lakefs_delta_sharing.yaml lakefs://repo/main/lakefs_delta_sharing.yaml
    83  ```
    84  
    85  ### 3. Connect Unity to lakeFS Delta Sharing!
    86  
    87  You now need to configure Unity to use the lakeFS Delta Sharing server. Create a share provider file `config.share.json`; see the [Delta Sharing manual][databricks-delta-sharing]:
    88  
    89     ```json
    90     {
    91       "shareCredentialsVersion": 1,
    92       "endpoint": "https://ORG_ID.REGION.lakefscloud.io/service/delta-sharing/v1",
    93       "bearerToken": "GENERATED TOKEN",
    94       "expirationTime": "2030-01-01T00:00:00.0Z"
    95     }
    96     ```
    97  
    98  "GENERATED TOKEN" is the secret authorization token use above.
    99  
   100  Install the [databricks cli][databricks-cli].  We will use it to create Delta Share on Unity.  Follow the
   101  [instructions](https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication-using-a-databricks-personal-access-token) to configure it.
   102  
   103  Run the provider creation command:
   104     ```shell
   105     databricks unity-catalog providers create \
   106         --name lakefs-cloud \
   107         --recipient-profile-json-file config.share.json
   108     ```
   109  
   110  Go to "Data >> Delta Sharing" on the DataBricks environment.  Once Treeverse have configured lakeFS Delta Sharing on your account with your config URL,
   111  the "lakefs-cloud" provider should appear under "Shared with me".
   112  
   113  <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-provider.png" alt="lakeFS-Cloud provider appearing on DataBricks Delta Sharing / Shared with me" class="quickstart"/>
   114  
   115  Click the provider to see its shares.
   116  
   117  <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-shares.png" alt="lakeFS-Cloud provider, showing share and create catalog" class="quickstart"/>
   118  
   119  You can now create a catalog from these shares.
   120  
   121  <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-create-catalog.png" alt="lakeFS-Cloud provider, create catalog from share" class="quickstart"/>
   122  
   123  And you can see schemas for each of the branches that you configured in the share.  Here branch name `dev_experiment1` matches the pattern `dev_*` that
   124  we defined in the configuration object `lakefs-delta-sharing.yaml`, so it appears as a schema.
   125  
   126  <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-schema-per-branch.png" alt="lakeFS-Cloud provider, every configured branch is a schema" class="quickstart"/>
   127  
   128  At this point you have configured Delta Sharing on lakeFS, and DataBricks to communicate with lakeFS delta sharing.  No further Treeverse involvement is required.  
   129  Updates to `lakefs_delta_sharing.yaml` will update within a minute of uploading a new version.
   130  
   131  ### 4. Configure tables
   132  
   133  Everything is ready: lakeFS repositories are configured as shares, and branches are configured as schemas.  Now you can define tables!  Once a
   134  repository is shared, its tables are configured as a table descriptor object on the repository on the path `_lakefs_tables/TABLE.yaml`.
   135  
   136  #### Delta Lake tables
   137  
   138  Delta Lake format includes full metadata, so you only need to configure the prefix:
   139  
   140  ```yaml
   141  name: users
   142  type: delta
   143  path: path/to/users/
   144  ```
   145  
   146  Note: The filename of the 'yaml' file containing the table definition must match the 'name' of the table itself. In the example above, '_lakefs_tables/users.yaml'.
   147  {: .note }
   148  
   149  When placed inside `_lakefs_tables/users.yaml` this defines a table `users` on the prefix `path/to/users/` (so `path/to/users/` holds the prefix `_delta_log`).
   150  
   151  #### Hive tables
   152  
   153  Hive metadata server tables are essentially just a set of objects that share a prefix, with no table metadata stored on the object store.  You need to configure prefix, partitions, and schema.
   154  
   155  ```yaml
   156  name: clicks
   157  type: hive
   158  path: path/to/clicks/
   159  partition_columns: ['year']
   160  schema:
   161    type: struct
   162    fields:
   163      - name: year
   164        type: integer
   165        nullable: false
   166        metadata: {}
   167      - name: page
   168        type: string
   169        nullable: false
   170        metadata: {}
   171      - name: site
   172        type: string
   173        nullable: true
   174        metadata:
   175          comment: a comment about this column
   176  ```
   177  
   178  Useful types recognized by DataBricks Photon include `integer`, `long`, `short`, `string`, `double`, `float`, `date`, and `timestamp`.  
   179  For exact type mappings, and whether to specify a field as `nullable: false`, refer to [DataBricks Photon documentation][databricks-photon-types].
   180  
   181  
   182  [databricks-delta-sharing]:  https://docs.databricks.com/data-sharing/manage-provider.html#instructions
   183  [databricks-cli]:  https://docs.databricks.com/dev-tools/cli/index.html
   184  [databricks-photon-types]:  https://docs.databricks.com/runtime/photon.html#photon-coverage