github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/unity-delta-sharing.md (about) 1 --- 2 title: Unity Delta Sharing 3 parent: lakeFS Cloud 4 description: The lakeFS Delta Sharing service lets you export DeltaLake and HMS-style tables stored on lakeFS over the Delta Sharing protocol. This is particularly useful with DataBricks Unity. 5 redirect_from: 6 - /cloud/unity-delta-sharing.html 7 - /cloud/unity-delta-sharing-m0-users 8 --- 9 10 # Unity Delta Sharing 11 {: .d-inline-block } 12 lakeFS Cloud 13 {: .label .label-green } 14 15 16 {: .warning } 17 > Please note, as of June 15th 2024, the Unity Delta Sharing feature will be removed. To integrate lakeFS with Unity Catalog, refer to the [Unity integration](../integrations/unity-catalog.md) docs. 18 19 ## Introduction 20 21 lakeFS Unity Delta Sharing provides a read-only experience from Unity Catalog for lakeFS customers. Currently, this is available as a private 22 preview. It provides _full read-only functionality_ for Unity Catalog. It does _not_ provide a "self-service" experience to set up the service. 23 24 ## Setup 25 26 This guide explains how to set up and use lakeFS Delta Sharing. Currently, you will have to configure lakeFS Delta Sharing in collaboration with 27 Treeverse Customer Success. Once set up is complete, you will of course be able to use lakeFS Delta Sharing on existing and on new tables without 28 further assistance. 29 30 ### 1. Collect data and initial setup 31 32 * Select a `Delta Sharing configuration URL`. This is a single location on lakeFS to hold the top-level configuration of lakeFS Delta Sharing across all 33 repositories of your organization. Typically, it will have the form `lakefs://REPO/main/lakefs_delta_sharing.yaml` for one of your 34 repositories. A longer path may be supplied - however, we do recommend keeping it on the `main` branch, as this object represents state 35 for the entire installation. 36 37 * Create a user `lakefs-delta-sharing-service` for lakeFS Delta Sharing, and an access key for that user. It should have at least read permissions for 38 the configuration URL and for all repositories and all data accesses by Unity. lakeFS Delta Sharing will these credentials to communicate with lakeFS. 39 40 **Communicate these items to Customer Success**: 41 42 * Configuration URL 43 * Access key ID and secret access key for user `lakefs-delta-sharing-service`. 44 45 Note: All YAML files extensions used in this guide must be `yaml`. Do not use a `yml` extension instead. 46 {: .note } 47 48 ### 2. Initial configuration 49 50 Select a secret authorization token to share Unity catalog. Unity catalog will use to authenticate to the lakeFS Delta Sharing server. 51 You might use this command on Linux: 52 53 ```sh 54 head -c 50 /dev/random | base64 55 ``` 56 57 Create a file `lakefs_delta_sharing.yaml` and place it at the config URL selected above. It should look like this: 58 59 ```yaml 60 authorization_token: "GENERATED TOKEN" 61 # Map lakeFS repositories to Unity shares 62 repositories: 63 - id: sample-repo 64 share_name: undev 65 # List branches and prefixes to export. Each of these branches (and only 66 # these branches) will be available as a schema on Unity. 67 branches: 68 - main 69 - staging 70 - dev_* 71 - id: repo2 72 share_name: share_two 73 branches: 74 - "*" 75 ``` 76 77 Note that a plain "*" line must be quoted in YAML. 78 79 Upload it to your config URL. For instance if the config URL is `lakefs://repo/main/lakefs_delta_sharing.yaml`, you might use: 80 81 ```sh 82 lakectl fs upload -s ./lakefs_delta_sharing.yaml lakefs://repo/main/lakefs_delta_sharing.yaml 83 ``` 84 85 ### 3. Connect Unity to lakeFS Delta Sharing! 86 87 You now need to configure Unity to use the lakeFS Delta Sharing server. Create a share provider file `config.share.json`; see the [Delta Sharing manual][databricks-delta-sharing]: 88 89 ```json 90 { 91 "shareCredentialsVersion": 1, 92 "endpoint": "https://ORG_ID.REGION.lakefscloud.io/service/delta-sharing/v1", 93 "bearerToken": "GENERATED TOKEN", 94 "expirationTime": "2030-01-01T00:00:00.0Z" 95 } 96 ``` 97 98 "GENERATED TOKEN" is the secret authorization token use above. 99 100 Install the [databricks cli][databricks-cli]. We will use it to create Delta Share on Unity. Follow the 101 [instructions](https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication-using-a-databricks-personal-access-token) to configure it. 102 103 Run the provider creation command: 104 ```shell 105 databricks unity-catalog providers create \ 106 --name lakefs-cloud \ 107 --recipient-profile-json-file config.share.json 108 ``` 109 110 Go to "Data >> Delta Sharing" on the DataBricks environment. Once Treeverse have configured lakeFS Delta Sharing on your account with your config URL, 111 the "lakefs-cloud" provider should appear under "Shared with me". 112 113 <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-provider.png" alt="lakeFS-Cloud provider appearing on DataBricks Delta Sharing / Shared with me" class="quickstart"/> 114 115 Click the provider to see its shares. 116 117 <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-shares.png" alt="lakeFS-Cloud provider, showing share and create catalog" class="quickstart"/> 118 119 You can now create a catalog from these shares. 120 121 <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-create-catalog.png" alt="lakeFS-Cloud provider, create catalog from share" class="quickstart"/> 122 123 And you can see schemas for each of the branches that you configured in the share. Here branch name `dev_experiment1` matches the pattern `dev_*` that 124 we defined in the configuration object `lakefs-delta-sharing.yaml`, so it appears as a schema. 125 126 <img src="{{ site.baseurl }}/assets/img/unity-delta-sharing-schema-per-branch.png" alt="lakeFS-Cloud provider, every configured branch is a schema" class="quickstart"/> 127 128 At this point you have configured Delta Sharing on lakeFS, and DataBricks to communicate with lakeFS delta sharing. No further Treeverse involvement is required. 129 Updates to `lakefs_delta_sharing.yaml` will update within a minute of uploading a new version. 130 131 ### 4. Configure tables 132 133 Everything is ready: lakeFS repositories are configured as shares, and branches are configured as schemas. Now you can define tables! Once a 134 repository is shared, its tables are configured as a table descriptor object on the repository on the path `_lakefs_tables/TABLE.yaml`. 135 136 #### Delta Lake tables 137 138 Delta Lake format includes full metadata, so you only need to configure the prefix: 139 140 ```yaml 141 name: users 142 type: delta 143 path: path/to/users/ 144 ``` 145 146 Note: The filename of the 'yaml' file containing the table definition must match the 'name' of the table itself. In the example above, '_lakefs_tables/users.yaml'. 147 {: .note } 148 149 When placed inside `_lakefs_tables/users.yaml` this defines a table `users` on the prefix `path/to/users/` (so `path/to/users/` holds the prefix `_delta_log`). 150 151 #### Hive tables 152 153 Hive metadata server tables are essentially just a set of objects that share a prefix, with no table metadata stored on the object store. You need to configure prefix, partitions, and schema. 154 155 ```yaml 156 name: clicks 157 type: hive 158 path: path/to/clicks/ 159 partition_columns: ['year'] 160 schema: 161 type: struct 162 fields: 163 - name: year 164 type: integer 165 nullable: false 166 metadata: {} 167 - name: page 168 type: string 169 nullable: false 170 metadata: {} 171 - name: site 172 type: string 173 nullable: true 174 metadata: 175 comment: a comment about this column 176 ``` 177 178 Useful types recognized by DataBricks Photon include `integer`, `long`, `short`, `string`, `double`, `float`, `date`, and `timestamp`. 179 For exact type mappings, and whether to specify a field as `nullable: false`, refer to [DataBricks Photon documentation][databricks-photon-types]. 180 181 182 [databricks-delta-sharing]: https://docs.databricks.com/data-sharing/manage-provider.html#instructions 183 [databricks-cli]: https://docs.databricks.com/dev-tools/cli/index.html 184 [databricks-photon-types]: https://docs.databricks.com/runtime/photon.html#photon-coverage