github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/hooks/lua.md (about)

     1  ---
     2  title: Lua Hooks
     3  parent: Actions and Hooks
     4  grand_parent: How-To
     5  description: Lua Hooks reference 
     6  redirect_from:
     7     - /hooks/lua.html
     8  ---
     9  
    10  
    11  # Lua Hooks
    12  
    13  lakeFS supports running hooks without relying on external components using an [embedded Lua VM](https://github.com/Shopify/go-lua)
    14  
    15  Using Lua hooks, it is possible to pass a Lua script to be executed directly by the lakeFS server when an action occurs.
    16  
    17  The Lua runtime embedded in lakeFS is limited for security reasons. It provides a narrow set of APIs and functions that by default do not allow:
    18  
    19  1. Accessing any of the running lakeFS server's environment
    20  2. Accessing the local filesystem available the lakeFS process
    21  
    22  {% include toc.html %}
    23  
    24  ## Action File Lua Hook Properties
    25  
    26  _See the [Action configuration](./index.md#action-file) for overall configuration schema and details._
    27  
    28  | Property      | Description                               | Data Type  | Required                                       | Default Value |
    29  |---------------|-------------------------------------------|------------|------------------------------------------------|---------------|
    30  | `args`        | One or more arguments to pass to the hook | Dictionary | false                                          |               |
    31  | `script`      | An inline Lua script                      | String     | either this or `script_path` must be specified |               |
    32  | `script_path` | The path in lakeFS to a Lua script        | String     | either this or `script` must be specified      |               |
    33  
    34  
    35  ## Example Lua Hooks
    36  
    37  For more examples and configuration samples, check out the [examples/hooks/](https://github.com/treeverse/lakeFS/tree/master/examples/hooks) directory in the lakeFS repository. You'll also find step-by-step examples of hooks in action in the [lakeFS samples repository](https://github.com/treeverse/lakeFS-samples/).
    38  
    39  ### Display information about an event
    40  
    41  This example will print out a JSON representation of the event that occurred:
    42  
    43  ```yaml
    44  name: dump_all
    45  on:
    46    post-commit:
    47    post-merge:
    48    post-create-tag:
    49    post-create-branch:
    50  hooks:
    51    - id: dump_event
    52      type: lua
    53      properties:
    54        script: |
    55          json = require("encoding/json")
    56          print(json.marshal(action))
    57  ```
    58  
    59  ### Ensure that a commit includes a mandatory metadata field
    60  
    61  A more useful example: ensure every commit contains a required metadata field:
    62  
    63  ```yaml
    64  name: pre commit metadata field check
    65  on:
    66  pre-commit:
    67      branches:
    68      - main
    69      - dev
    70  hooks:
    71    - id: ensure_commit_metadata
    72      type: lua
    73      properties:
    74        args:
    75          notebook_url: {"pattern": "my-jupyter.example.com/.*"}
    76          spark_version:  {}
    77        script_path: lua_hooks/ensure_metadata_field.lua
    78  ```
    79  
    80  Lua code at `lakefs://repo/main/lua_hooks/ensure_metadata_field.lua`:
    81  
    82  ```lua
    83  regexp = require("regexp")
    84  for k, props in pairs(args) do
    85    current_value = action.commit.metadata[k]
    86    if current_value == nil then
    87      error("missing mandatory metadata field: " .. k)
    88    end
    89    if props.pattern and not regexp.match(props.pattern, current_value) then
    90      error("current value for commit metadata field " .. k .. " does not match pattern: " .. props.pattern .. " - got: " .. current_value)
    91    end
    92  end
    93  ```
    94  
    95  For more examples and configuration samples, check out the [examples/hooks/](https://github.com/treeverse/lakeFS/tree/master/examples/hooks) directory in the lakeFS repository.
    96  
    97  ## Lua Library reference
    98  
    99  The Lua runtime embedded in lakeFS is limited for security reasons. The provided APIs are shown below.
   100  
   101  ### `array(table)`
   102  
   103  Helper function to mark a table object as an array for the runtime by setting `_is_array: true` metatable field.
   104  
   105  ### `aws`
   106  
   107  ### `aws/s3_client`
   108  S3 client library.
   109  
   110  ```lua
   111  local aws = require("aws")
   112  -- pass valid AWS credentials
   113  local client = aws.s3_client("ACCESS_KEY_ID", "SECRET_ACCESS_KEY", "REGION")
   114  ```
   115  
   116  ### `aws/s3_client.get_object(bucket, key)`
   117  
   118  Returns the body (as a Lua string) of the requested object and a boolean value that is true if the requested object exists
   119  
   120  ### `aws/s3_client.put_object(bucket, key, value)`
   121  
   122  Sets the object at the given bucket and key to the value of the supplied value string
   123  
   124  ### `aws/s3_client.delete_object(bucket [, key])`
   125  
   126  Deletes the object at the given key
   127  
   128  ### `aws/s3_client.list_objects(bucket [, prefix, continuation_token, delimiter])`
   129  
   130  Returns a table of results containing the following structure:
   131  
   132  * `is_truncated`: (boolean) whether there are more results to paginate through using the continuation token
   133  * `next_continuation_token`: (string) to pass in the next request to get the next page of results
   134  * `results` (table of tables) information about the objects (and prefixes if a delimiter is used)
   135  
   136  a result could in one of the following structures
   137  
   138  ```lua
   139  {
   140     ["key"] = "a/common/prefix/",
   141     ["type"] = "prefix"
   142  }
   143  ```
   144  
   145  or:
   146  
   147  ```lua
   148  {
   149     ["key"] = "path/to/object",
   150     ["type"] = "object",
   151     ["etag"] = "etagString",
   152     ["size"] = 1024,
   153     ["last_modified"] = "2023-12-31T23:10:00Z"
   154  }
   155  ```
   156  
   157  ### `aws/s3_client.delete_recursive(bucket, prefix)`
   158  
   159  Deletes all objects under the given prefix
   160  
   161  ### `aws/glue`
   162  
   163  Glue client library.
   164  
   165  ```lua
   166  local aws = require("aws")
   167  -- pass valid AWS credentials
   168  local glue = aws.glue_client("ACCESS_KEY_ID", "SECRET_ACCESS_KEY", "REGION")
   169  ```
   170  
   171  ### `aws/glue.get_table(database, table [, catalog_id)`
   172  
   173  Describe a table from the Glue catalog.
   174  
   175  Example:
   176  
   177  ```lua
   178  local table, exists = glue.get_table(db, table_name)
   179  if exists then
   180    print(json.marshal(table))
   181  ```
   182  
   183  ### `aws/glue.create_table(database, table_input, [, catalog_id])`
   184  
   185  Create a new table in Glue Catalog.
   186  The `table_input` argument is a JSON that is passed "as is" to AWS and is parallel to the AWS SDK [TableInput](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html#API_CreateTable_RequestSyntax)
   187  
   188  Example: 
   189  
   190  ```lua
   191  local json = require("encoding/json")
   192  local input = {
   193      Name = "my-table",
   194      PartitionKeys = array(partitions),
   195      -- etc...
   196  }
   197  local json_input = json.marshal(input)
   198  glue.create_table("my-db", table_input)
   199  ```
   200  
   201  ### `aws/glue.update_table(database, table_input, [, catalog_id, version_id, skip_archive])`
   202  
   203  Update an existing Table in Glue Catalog.
   204  The `table_input` is the same as the argument in `glue.create_table` function.
   205  
   206  ### `aws/glue.delete_table(database, table_input, [, catalog_id])`
   207  
   208  Delete an existing Table in Glue Catalog.
   209  
   210  ### `azure`
   211  
   212  ### `azure/blob_client`
   213  Azure blob client library.
   214  
   215  ```lua
   216  local azure = require("azure")
   217  -- pass valid Azure credentials
   218  local client = azure.blob_client("AZURE_STORAGE_ACCOUNT", "AZURE_ACCESS_KEY")
   219  ```
   220  
   221  ### `azure/blob_client.get_object(path_uri)`
   222  
   223  Returns the body (as a Lua string) of the requested object and a boolean value that is true if the requested object exists  
   224  `path_uri` - A valid Azure blob storage uri in the form of `https://myaccount.blob.core.windows.net/mycontainer/myblob`
   225  
   226  ### `azure/blob_client.put_object(path_uri, value)`
   227  
   228  Sets the object at the given bucket and key to the value of the supplied value string  
   229  `path_uri` - A valid Azure blob storage uri in the form of `https://myaccount.blob.core.windows.net/mycontainer/myblob`
   230  
   231  ### `azure/blob_client.delete_object(path_uri)`
   232  
   233  Deletes the object at the given key  
   234  `path_uri` - A valid Azure blob storage uri in the form of `https://myaccount.blob.core.windows.net/mycontainer/myblob`
   235  
   236  ### `azure/abfss_transform_path(path)`
   237  
   238  Transform an HTTPS Azure URL to a ABFSS scheme. Used by the delta_exporter function to support Azure Unity catalog use cases    
   239  `path` - A valid Azure blob storage URL in the form of `https://myaccount.blob.core.windows.net/mycontainer/myblob`
   240  
   241  ### `crypto`
   242  
   243  ### `crypto/aes/encryptCBC(key, plaintext)`
   244  
   245  Returns a ciphertext for the aes encrypted text
   246  
   247  ### `crypto/aes/decryptCBC(key, ciphertext)`
   248  
   249  Returns the decrypted (plaintext) string for the encrypted ciphertext
   250  
   251  ### `crypto/hmac/sign_sha256(message, key)`
   252  
   253  Returns a SHA256 hmac signature for the given message with the supplied key (using the SHA256 hashing algorithm)
   254  
   255  ### `crypto/hmac/sign_sha1(message, key)`
   256  
   257  Returns a SHA1 hmac signature for the given message with the supplied key (using the SHA1 hashing algorithm)
   258  
   259  ### `crypto/md5/digest(data)`
   260  
   261  Returns the MD5 digest (string) of the given data
   262  
   263  ### `crypto/sha256/digest(data)`
   264  
   265  Returns the SHA256 digest (string) of the given data
   266  
   267  ### `databricks/client(databricks_host, databricks_service_principal_token)`
   268  
   269  Returns a table representing a Databricks client with the `register_external_table` and `create_or_get_schema` methods.
   270  
   271  ### `databricks/client.create_schema(schema_name, catalog_name, get_if_exists)`
   272  
   273  Creates a schema, or retrieves it if exists, in the configured Databricks host's Unity catalog.
   274  If a schema doesn't exist, a new schema with the given `schema_name` will be created under the given `catalog_name`.
   275  Returns the created/fetched schema name.
   276  
   277  Parameters:
   278  
   279  - `schema_name(string)`: The required schema name
   280  - `catalog_name(string)`: The catalog name under which the schema will be created (or from which it will be fetched)
   281  - `get_if_exists(boolean)`: In case of failure due to an existing schema with the given `schema_name` in the given
   282  `catalog_name`, return the schema.
   283  
   284  Example:
   285  
   286  ```lua
   287  local databricks = require("databricks")
   288  local client = databricks.client("https://my-host.cloud.databricks.com", "my-service-principal-token")
   289  local schema_name = client.create_schema("main", "mycatalog", true)
   290  ```
   291  
   292  ### `databricks/client.register_external_table(table_name, physical_path, warehouse_id, catalog_name, schema_name, metadata)`
   293  
   294  Registers an external table under the provided warehouse ID, catalog name, and schema name.
   295  In order for this method call to succeed, an external location should be configured in the catalog, with the 
   296  `physical_path`'s root storage URI (for example: `s3://mybucket`).
   297  Returns the table's creation status.
   298  
   299  Parameters:
   300  
   301  - `table_name(string)`: Table name.
   302  - `physical_path(string)`: A location to which the external table will refer, e.g. `s3://mybucket/the/path/to/mytable`.
   303  - `warehouse_id(string)`: The SQL warehouse ID used in Databricks to run the `CREATE TABLE` query (fetched from the SQL warehouse
   304  `Connection Details`, or by running `databricks warehouses get`, choosing your SQL warehouse and fetching its ID).
   305  - `catalog_name(string)`: The name of the catalog under which a schema will be created (or fetched from).
   306  - `schema_name(string)`: The name of the schema under which the table will be created.
   307  - `metadata(table)`: A table of metadata to be added to the table's registration. The metadata table should be of the form:
   308    `{key1 = "value1", key2 = "value2", ...}`.
   309  
   310  Example:
   311  
   312  ```lua
   313  local databricks = require("databricks")
   314  local client = databricks.client("https://my-host.cloud.databricks.com", "my-service-principal-token")
   315  local status = client.register_external_table("mytable", "s3://mybucket/the/path/to/mytable", "examwarehouseple", "my-catalog-name", "myschema")
   316  ```
   317  
   318  - For the Databricks permissions needed to run this method, check out the [Unity Catalog Exporter]({% link integrations/unity-catalog.md %}) docs.
   319  
   320  ### `encoding/base64/encode(data)`
   321  
   322  Encodes the given data to a base64 string
   323  
   324  ### `encoding/base64/decode(data)`
   325  
   326  Decodes the given base64 encoded data and return it as a string
   327  
   328  ### `encoding/base64/url_encode(data)`
   329  
   330  Encodes the given data to an unpadded alternate base64 encoding defined in RFC 4648.
   331  
   332  ### `encoding/base64/url_decode(data)`
   333  
   334  Decodes the given unpadded alternate base64 encoding defined in RFC 4648 and return it as a string
   335  
   336  ### `encoding/hex/encode(value)`
   337  
   338  Encode the given value string to hexadecimal values (string)
   339  
   340  ### `encoding/hex/decode(value)`
   341  
   342  Decode the given hexadecimal string back to the string it represents (UTF-8)
   343  
   344  ### `encoding/json/marshal(table)`
   345  
   346  Encodes the given table into a JSON string
   347  
   348  ### `encoding/json/unmarshal(string)`
   349  
   350  Decodes the given string into the equivalent Lua structure
   351  
   352  ### `encoding/yaml/marshal(table)`
   353  
   354  Encodes the given table into a YAML string
   355  
   356  ### `encoding/yaml/unmarshal(string)`
   357  
   358  Decodes the given YAML encoded string into the equivalent Lua structure
   359  
   360  ### `encoding/parquet/get_schema(payload)`
   361  
   362  Read the payload (string) as the contents of a Parquet file and return its schema in the following table structure:
   363  
   364  ```lua
   365  {
   366    { ["name"] = "column_a", ["type"] = "INT32" },
   367    { ["name"] = "column_b", ["type"] = "BYTE_ARRAY" }
   368  }
   369  ```
   370  
   371  ### `formats`
   372  
   373  ### `formats/delta_client(key, secret, region)`
   374  
   375  Creates a new Delta Lake client used to interact with the lakeFS server.
   376  - `key`: lakeFS access key id
   377  - `secret`: lakeFS secret access key
   378  - `region`: The region in which your lakeFS server is configured at.
   379  
   380  ### `formats/delta_client.get_table(repository_id, reference_id, prefix)`
   381  
   382  Returns a representation of a Delta Lake table under the given repository, reference, and prefix.
   383  The format of the response is two tables: 
   384  1. the first is a table of the format `{number, {string}}` where `number` is a version in the Delta Log, and the mapped `{string}` 
   385  array contains JSON strings of the different Delta Lake log operations listed in the mapped version entry. e.g.:
   386  ```lua
   387  {
   388    0 = {
   389      "{\"commitInfo\":...}",
   390      "{\"add\": ...}",
   391      "{\"remove\": ...}"
   392    },
   393    1 = {
   394      "{\"commitInfo\":...}",
   395      "{\"add\": ...}",
   396      "{\"remove\": ...}"
   397    }
   398  }
   399  ```
   400  2. the second is a table of the metadata of the current table snapshot. The metadata table can be used to initialize the Delta Lake table in an external Catalog.  
   401  It consists of the following fields:
   402      - `id`: The table's ID
   403      - `name`: The table's name
   404      - `description`: The table's description
   405      - `schema_string`: The table's schema string
   406      - `partition_columns`: The table's partition columns
   407      - `configuration`: The table's configuration
   408      - `created_time`: The table's creation time
   409  
   410  ### `gcloud`
   411  
   412  ### `gcloud/gs_client(gcs_credentials_json_string)`
   413  
   414  Create a new Google Cloud Storage client using a string that contains a valid [`credentials.json`](https://developers.google.com/workspace/guides/create-credentials) file content.
   415  
   416  ### `gcloud/gs.write_fuse_symlink(source, destination, mount_info)`
   417  
   418  Will create a [gcsfuse symlink](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md#symlink-inodes)
   419  from the source (typically a lakeFS physical address for an object) to a given destination.
   420  
   421  `mount_info` is a Lua table with `"from"` and `"to"` keys - since symlinks don't work for `gs://...` URIs, they need to point
   422  to the mounted location instead. `from` will be removed from the beginning of `source`, and `destination` will be added instead.
   423  
   424  Example:
   425  
   426  ```lua
   427  source = "gs://bucket/lakefs/data/abc/def"
   428  destination = "gs://bucket/exported/path/to/object"
   429  mount_info = {
   430      ["from"] = "gs://bucket",
   431      ["to"] = "/home/user/gcs-mount"
   432  }
   433  gs.write_fuse_symlink(source, destination, mount_info)
   434  -- Symlink: "/home/user/gcs-mount/exported/path/to/object" -> "/home/user/gcs-mount/lakefs/data/abc/def"
   435  ```
   436  
   437  ### `hook`
   438  
   439  A set of utilities to aide in writing user friendly hooks.
   440  
   441  ### `hook/fail(message)`
   442  
   443  Will abort the current hook's execution with the given message. This is similar to using `error()`, but is typically used to separate
   444  generic runtime errors (an API call that returned an unexpected response) and explict failure of the calling hook.
   445  
   446  When called, errors will appear without a stacktrace, and the error message will be directly the one given as `message`.
   447  
   448  ```lua
   449  > hook = require("hook")
   450  > hook.fail("this hook shall not pass because of: " .. reason)
   451  ```
   452  
   453  ### `lakefs`
   454  
   455  The Lua Hook library allows calling back to the lakeFS API using the identity of the user that triggered the action.
   456  For example, if user A tries to commit and triggers a `pre-commit` hook - any call made inside that hook to the lakeFS
   457  API, will automatically use user A's identity for authorization and auditing purposes.
   458  
   459  ### `lakefs/create_tag(repository_id, reference_id, tag_id)`
   460  
   461  Create a new tag for the given reference
   462  
   463  ### `lakefs/diff_refs(repository_id, lef_reference_id, right_reference_id [, after, prefix, delimiter, amount])`
   464  
   465  Returns an object-wise diff between `left_reference_id` and `right_reference_id`.
   466  
   467  ### `lakefs/list_objects(repository_id, reference_id [, after, prefix, delimiter, amount])`
   468  
   469  List objects in the specified repository and reference (branch, tag, commit ID, etc.).
   470  If delimiter is empty, will default to a recursive listing. Otherwise, common prefixes up to `delimiter` will be shown as a single entry.
   471  
   472  ### `lakefs/get_object(repository_id, reference_id, path)`
   473  
   474  Returns 2 values:
   475  
   476  1. The HTTP status code returned by the lakeFS API
   477  1. The content of the specified object as a lua string
   478  
   479  ### `lakefs/diff_branch(repository_id, branch_id [, after, amount, prefix, delimiter])`
   480  
   481  Returns an object-wise diff of uncommitted changes on `branch_id`.
   482  
   483  ### `lakefs/stat_object(repository_id, ref_id, path)`
   484  
   485  Returns a stat object for the given path under the given reference and repository.
   486  
   487  ### `lakefs/catalogexport/glue_exporter.get_full_table_name(descriptor, action_info)`
   488  
   489  Generate glue table name.
   490  
   491  Parameters:
   492  
   493  - `descriptor(Table)`: Object from (e.g. _lakefs_tables/my_table.yaml).
   494  - `action_info(Table)`: The global action object.
   495  
   496  ### `lakefs/catalogexport/delta_exporter`
   497  
   498  A package used to export Delta Lake tables from lakeFS to an external cloud storage.
   499  
   500  ### `lakefs/catalogexport/delta_exporter.export_delta_log(action, table_def_names, write_object, delta_client, table_descriptors_path, path_transformer)`
   501  
   502  The function used to export Delta Lake tables.
   503  The return value is a table with mapping of table names to external table location (from which it is possible to query the data) and latest Delta table version's metadata.  
   504  The response is of the form: 
   505  `{<table_name> = {path = "s3://mybucket/mypath/mytable", metadata = {id = "table_id", name = "table_name", ...}}}`.
   506  
   507  Parameters:
   508  
   509  - `action`: The global action object
   510  - `table_def_names`: Delta tables name list (e.g. `{"table1", "table2"}`)
   511  - `write_object`: A writer function with `function(bucket, key, data)` signature, used to write the exported Delta Log (e.g. `aws/s3_client.put_object` or `azure/blob_client.put_object`)
   512  - `delta_client`: A Delta Lake client that implements `get_table: function(repo, ref, prefix)`
   513  - `table_descriptors_path`: The path under which the table descriptors of the provided `table_def_names` reside
   514  - `path_transformer`: (Optional) A function(path) used for transforming the path of the saved delta logs path fields as well as the saved table physical path (used to support Azure Unity catalog use cases)
   515  
   516  Delta export example for AWS S3:
   517  
   518  ```yaml
   519  ---
   520  name: delta_exporter
   521  on:
   522    post-commit: null
   523  hooks:
   524    - id: delta_export
   525      type: lua
   526      properties:
   527        script: |
   528          local aws = require("aws")
   529          local formats = require("formats")
   530          local delta_exporter = require("lakefs/catalogexport/delta_exporter")
   531          local json = require("encoding/json")
   532  
   533          local table_descriptors_path = "_lakefs_tables"
   534          local sc = aws.s3_client(args.aws.access_key_id, args.aws.secret_access_key, args.aws.region)
   535          local delta_client = formats.delta_client(args.lakefs.access_key_id, args.lakefs.secret_access_key, args.aws.region)
   536          local delta_table_details = delta_export.export_delta_log(action, args.table_defs, sc.put_object, delta_client, table_descriptors_path)
   537          
   538          for t, details in pairs(delta_table_details) do
   539            print("Delta Lake exported table \"" .. t .. "\"'s location: " .. details["path"] .. "\n")
   540            print("Delta Lake exported table \"" .. t .. "\"'s metadata:\n")
   541            for k, v in pairs(details["metadata"]) do
   542              if type(v) == "table" then
   543                print("\t" .. k .. " = " .. json.marshal(v) .. "\n")
   544              else
   545                print("\t" .. k .. " = " .. v .. "\n")
   546              end
   547            end
   548          end
   549        args:
   550          aws:
   551            access_key_id: <AWS_ACCESS_KEY_ID>
   552            secret_access_key: <AWS_SECRET_ACCESS_KEY>
   553            region: us-east-1
   554          lakefs:
   555            access_key_id: <LAKEFS_ACCESS_KEY_ID> 
   556            secret_access_key: <LAKEFS_SECRET_ACCESS_KEY>
   557          table_defs:
   558            - mytable
   559  ```
   560  
   561  For the table descriptor under the `_lakefs_tables/mytable.yaml`:
   562  ```yaml
   563  ---
   564  name: myTableActualName
   565  type: delta
   566  path: a/path/to/my/delta/table
   567  ```
   568  
   569  Delta export example for Azure Blob Storage:
   570  
   571  ```yaml
   572  name: Delta Exporter
   573  on:
   574    post-commit:
   575      branches: ["{{ .Branch }}*"]
   576  hooks:
   577    - id: delta_exporter
   578      type: lua
   579      properties:
   580        script: |
   581          local azure = require("azure")
   582          local formats = require("formats")
   583          local delta_exporter = require("lakefs/catalogexport/delta_exporter")
   584  
   585          local table_descriptors_path = "_lakefs_tables"
   586          local sc = azure.blob_client(args.azure.storage_account, args.azure.access_key)
   587          local function write_object(_, key, buf)
   588            return sc.put_object(key,buf)
   589          end
   590          local delta_client = formats.delta_client(args.lakefs.access_key_id, args.lakefs.secret_access_key)
   591          local delta_table_details = delta_export.export_delta_log(action, args.table_defs, sc.put_object, delta_client, table_descriptors_path)
   592          
   593          for t, details in pairs(delta_table_details) do
   594            print("Delta Lake exported table \"" .. t .. "\"'s location: " .. details["path"] .. "\n")
   595            print("Delta Lake exported table \"" .. t .. "\"'s metadata:\n")
   596            for k, v in pairs(details["metadata"]) do
   597              if type(v) == "table" then
   598                print("\t" .. k .. " = " .. json.marshal(v) .. "\n")
   599              else
   600                print("\t" .. k .. " = " .. v .. "\n")
   601              end
   602            end
   603          end
   604        args:
   605          azure:
   606            storage_account: "{{ .AzureStorageAccount }}"
   607            access_key: "{{ .AzureAccessKey }}"
   608          lakefs: # provide credentials of a user that has access to the script and Delta Table
   609            access_key_id: "{{ .LakeFSAccessKeyID }}"
   610            secret_access_key: "{{ .LakeFSSecretAccessKey }}"
   611          table_defs:
   612            - mytable
   613  
   614  ```
   615  
   616  ### `lakefs/catalogexport/table_extractor`
   617  
   618  Utility package to parse `_lakefs_tables/` descriptors.
   619  
   620  ### `lakefs/catalogexport/table_extractor.list_table_descriptor_entries(client, repo_id, commit_id)`
   621  
   622  List all YAML files under `_lakefs_tables/*` and return a list of type `[{physical_address, path}]`, ignores hidden files. 
   623  The `client` is `lakefs` client.
   624  
   625  ### `lakefs/catalogexport/table_extractor.get_table_descriptor(client, repo_id, commit_id, logical_path)`
   626  
   627  Read a table descriptor and parse YAML object. Will set `partition_columns` to `{}` if no partitions are defined.
   628  The `client` is `lakefs` client.
   629  
   630  ### `lakefs/catalogexport/hive.extract_partition_pager(client, repo_id, commit_id, base_path, partition_cols, page_size)`
   631  
   632  Hive format partition iterator each result set is a collection of files under the same partition in lakeFS.
   633  
   634  Example: 
   635  
   636  ```lua
   637  local lakefs = require("lakefs")
   638  local pager = hive.extract_partition_pager(lakefs, repo_id, commit_id, prefix, partitions, 10)
   639  for part_key, entries in pager do
   640      print("partition: " .. part_key)
   641      for _, entry in ipairs(entries) do
   642          print("path: " .. entry.path .. " physical: " .. entry.physical_address)
   643      end
   644  end
   645  ```
   646  
   647  ### `lakefs/catalogexport/symlink_exporter`
   648  
   649  Writes metadata for a table using Hive's [SymlinkTextInputFormat](https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.1.1/api/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.html).
   650  Currently only `S3` is supported.
   651  
   652  The default export paths per commit:
   653  
   654  ```
   655  ${storageNamespace}
   656  _lakefs/
   657      exported/
   658          ${ref}/
   659              ${commitId}/
   660                  ${tableName}/
   661                      p1=v1/symlink.txt
   662                      p1=v2/symlink.txt
   663                      p1=v3/symlink.txt
   664                      ...
   665  ```
   666  
   667  ### `lakefs/catalogexport/symlink_exporter.export_s3(s3_client, table_src_path, action_info [, options])`
   668  
   669  Export Symlink files that represent a table to S3 location.
   670  
   671  Parameters:
   672  
   673  - `s3_client`: Configured client.
   674  - `table_src_path(string)`: Path to the table spec YAML file in `_lakefs_tables` (e.g. _lakefs_tables/my_table.yaml).
   675  - `action_info(table)`: The global action object.
   676  - `options(table)`:
   677    - `debug(boolean)`: Print extra info.
   678    - `export_base_uri(string)`: Override the prefix in S3 e.g. `s3://other-bucket/path/`.
   679    - `writer(function(bucket, key, data))`: If passed then will not use s3 client, helpful for debug.
   680  
   681  Example:
   682  
   683  ```lua
   684  local exporter = require("lakefs/catalogexport/symlink_exporter")
   685  local aws = require("aws")
   686  -- args are user inputs from a lakeFS action.
   687  local s3 = aws.s3_client(args.aws.aws_access_key_id, args.aws.aws_secret_access_key, args.aws.aws_region)
   688  exporter.export_s3(s3, args.table_descriptor_path, action, {debug=true})
   689  ```
   690  
   691  ### `lakefs/catalogexport/glue_exporter`
   692  
   693  A Package for automating the export process from lakeFS stored tables into Glue catalog.
   694  
   695  ### `lakefs/catalogexport/glue_exporter.export_glue(glue, db, table_src_path, create_table_input, action_info, options)`
   696  
   697  Represent lakeFS table in Glue Catalog. 
   698  This function will create a table in Glue based on configuration. 
   699  It assumes that there is a symlink location that is already created and only configures it by default for the same commit.
   700  
   701  Parameters:
   702  
   703  - `glue`: AWS glue client
   704  - `db(string)`: glue database name
   705  - `table_src_path(string)`: path to table spec (e.g. _lakefs_tables/my_table.yaml)
   706  - `create_table_input(Table)`: Input equal mapping to [table_input](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html#API_CreateTable_RequestSyntax) in AWS, the same as we use for `glue.create_table`.
   707  should contain inputs describing the data format (e.g. InputFormat, OutputFormat, SerdeInfo) since the exporter is agnostic to this. 
   708  by default this function will configure table location and schema.
   709  - `action_info(Table)`: the global action object.
   710  - `options(Table)`:
   711    - `table_name(string)`: Override default glue table name
   712    - `debug(boolean`
   713    - `export_base_uri(string)`: Override the default prefix in S3 for symlink location e.g. s3://other-bucket/path/
   714  
   715  When creating a glue table, the final table input will consist of the `create_table_input` input parameter and lakeFS computed defaults that will override it:
   716  
   717  - `Name` Gable table name `get_full_table_name(descriptor, action_info)`.
   718  - `PartitionKeys` Partition columns usually deduced from `_lakefs_tables/${table_src_path}`.
   719  - `TableType` = "EXTERNAL_TABLE"
   720  - `StorageDescriptor`: Columns usually deduced from `_lakefs_tables/${table_src_path}`.
   721  - `StorageDescriptor.Location` = symlink_location
   722  
   723  Example: 
   724  
   725  ```lua
   726  local aws = require("aws")
   727  local exporter = require("lakefs/catalogexport/glue_exporter")
   728  local glue = aws.glue_client(args.aws_access_key_id, args.aws_secret_access_key, args.aws_region)
   729  -- table_input can be passed as a simple Key-Value object in YAML as an argument from an action, this is inline example:
   730  local table_input = {
   731    StorageDescriptor: 
   732      InputFormat: "org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat"
   733      OutputFormat: "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"
   734      SerdeInfo:
   735        SerializationLibrary: "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
   736    Parameters: 
   737      classification: "parquet"
   738      EXTERNAL: "TRUE"
   739      "parquet.compression": "SNAPPY"
   740  }
   741  exporter.export_glue(glue, "my-db", "_lakefs_tables/animals.yaml", table_input, action, {debug=true})
   742  ```
   743  
   744  ### `lakefs/catalogexport/glue_exporter.get_full_table_name(descriptor, action_info)`
   745  
   746  Generate glue table name.
   747  
   748  Parameters:
   749  
   750  - `descriptor(Table)`: Object from (e.g. _lakefs_tables/my_table.yaml).
   751  - `action_info(Table)`: The global action object.
   752  
   753  ### `lakefs/catalogexport/unity_exporter`
   754  
   755  A package used to register exported Delta Lake tables to Databricks' Unity catalog.
   756  
   757  ### `lakefs/catalogexport/unity_exporter.register_tables(action, table_descriptors_path, delta_table_details, databricks_client, warehouse_id)`
   758  
   759  The function used to register exported Delta Lake tables in Databricks' Unity Catalog.
   760  The registration will use the following paths to register the table:
   761  `<catalog>.<branch name>.<table_name>` where the branch name will be used as the schema name.
   762  The return value is a table with mapping of table names to registration request status.
   763  
   764  **Note: (Azure users)** Databricks catalog external locations is supported only for ADLS Gen2 storage accounts.  
   765  When exporting Delta tables using the `lakefs/catalogexport/delta_exporter.export_delta_log` function, the `path_transformer` must be  
   766  used to convert the paths scheme to `abfss`. The built-in `azure` lua library provides this functionality with `transformPathToAbfss`.
   767  
   768  Parameters:
   769  
   770  - `action(table)`: The global action table
   771  - `table_descriptors_path(string)`: The path under which the table descriptors of the provided `table_paths` reside.
   772  - `delta_table_details(table)`: Table names to physical paths mapping and table metadata (e.g. `{table1 = {path = "s3://mybucket/mytable1", metadata = {id = "table_1_id", name = "table1", ...}}, table2 = {path = "s3://mybucket/mytable2", metadata = {id = "table_2_id", name = "table2", ...}}}`.)
   773  - `databricks_client(table)`: A Databricks client that implements `create_or_get_schema: function(id, catalog_name)` and `register_external_table: function(table_name, physical_path, warehouse_id, catalog_name, schema_name)`
   774  - `warehouse_id(string)`: Databricks warehouse ID.
   775  
   776  Example:
   777  The following registers an exported Delta Lake table to Unity Catalog.
   778  
   779  ```lua
   780  local databricks = require("databricks")
   781  local unity_export = require("lakefs/catalogexport/unity_exporter")
   782  
   783  local delta_table_locations = {
   784    ["table1"] = "s3://mybucket/mytable1",
   785  }
   786  -- Register the exported table in Unity Catalog:
   787  local action_details = {
   788    repository_id = "my-repo",
   789    commit_id = "commit_id",
   790    branch_id = "main",
   791  }
   792  local databricks_client = databricks.client("<DATABRICKS_HOST>", "<DATABRICKS_TOKEN>")
   793  local registration_statuses = unity_export.register_tables(action_details, "_lakefs_tables", delta_table_locations, databricks_client, "<WAREHOUSE_ID>")
   794  
   795  for t, status in pairs(registration_statuses) do
   796    print("Unity catalog registration for table \"" .. t .. "\" completed with status: " .. status .. "\n")
   797  end
   798  ```
   799  
   800  For the table descriptor under the `_lakefs_tables/delta-table-descriptor.yaml`:
   801  ```yaml
   802  ---
   803  name: my_table_name
   804  type: delta
   805  path: path/to/delta/table/data
   806  catalog: my-catalog
   807  ```
   808  
   809  For detailed step-by-step guide on how to use `unity_exporter.register_tables` as a part of a lakeFS action refer to
   810  the [Unity Catalog docs]({% link integrations/unity-catalog.md %}).
   811  
   812  ### `path/parse(path_string)`
   813  
   814  Returns a table for the given path string with the following structure:
   815  
   816  ```lua
   817  > require("path")
   818  > path.parse("a/b/c.csv")
   819  {
   820      ["parent"] = "a/b/"
   821      ["base_name"] = "c.csv"
   822  } 
   823  ```
   824  
   825  ### `path/join(*path_parts)`
   826  
   827  Receives a variable number of strings and returns a joined string that represents a path:
   828  
   829  ```lua
   830  > path = require("path")
   831  > path.join("/", "path/", "to", "a", "file.data")
   832  path/o/a/file.data
   833  ```
   834  
   835  ### `path/is_hidden(path_string [, seperator, prefix])`
   836  
   837  returns a boolean - `true` if the given path string is hidden (meaning it starts with `prefix`) - or if any of its parents start with `prefix`.
   838  
   839  ```lua
   840  > require("path")
   841  > path.is_hidden("a/b/c") -- false
   842  > path.is_hidden("a/b/_c") -- true
   843  > path.is_hidden("a/_b/c") -- true
   844  > path.is_hidden("a/b/_c/") -- true
   845  ```
   846  ### `path/default_separator()`
   847  
   848  Returns a constant string (`/`)
   849  
   850  ### `regexp/match(pattern, s)`
   851  
   852  Returns true if the string `s` matches `pattern`.
   853  This is a thin wrapper over Go's [regexp.MatchString](https://pkg.go.dev/regexp#MatchString){: target="_blank" }.
   854  
   855  ### `regexp/quote_meta(s)`
   856  
   857  Escapes any meta-characters in string `s` and returns a new string
   858  
   859  ### `regexp/compile(pattern)`
   860  
   861  Returns a regexp match object for the given pattern
   862  
   863  ### `regexp/compiled_pattern.find_all(s, n)`
   864  
   865  Returns a table list of all matches for the pattern, (up to `n` matches, unless `n == -1` in which case all possible matches will be returned)
   866  
   867  ### `regexp/compiled_pattern.find_all_submatch(s, n)`
   868  
   869  Returns a table list of all sub-matches for the pattern, (up to `n` matches, unless `n == -1` in which case all possible matches will be returned).
   870  Submatches are matches of parenthesized subexpressions (also known as capturing groups) within the regular expression,
   871  numbered from left to right in order of opening parenthesis.
   872  Submatch 0 is the match of the entire expression, submatch 1 is the match of the first parenthesized subexpression, and so on
   873  
   874  ### `regexp/compiled_pattern.find(s)`
   875  
   876  Returns a string representing the left-most match for the given pattern in string `s`
   877  
   878  ### `regexp/compiled_pattern.find_submatch(s)`
   879  
   880  find_submatch returns a table of strings holding the text of the leftmost match of the regular expression in `s` and the matches, if any, of its submatches
   881  
   882  ### `strings/split(s, sep)`
   883  
   884  returns a table of strings, the result of splitting `s` with `sep`.
   885  
   886  ### `strings/trim(s)`
   887  
   888  Returns a string with all leading and trailing white space removed, as defined by Unicode
   889  
   890  ### `strings/replace(s, old, new, n)`
   891  
   892  Returns a copy of the string s with the first n non-overlapping instances of `old` replaced by `new`.
   893  If `old` is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string.
   894  
   895  If n < 0, there is no limit on the number of replacements
   896  
   897  ### `strings/has_prefix(s, prefix)`
   898  
   899  Returns `true` if `s` begins with `prefix`
   900  
   901  ### `strings/has_suffix(s, suffix)`
   902  
   903  Returns `true` if `s` ends with `suffix`
   904  
   905  ### `strings/contains(s, substr)`
   906  
   907  Returns `true` if `substr` is contained anywhere in `s`
   908  
   909  ### `time/now()`
   910  
   911  Returns a `float64` representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00).
   912  
   913  ### `time/format(epoch_nano, layout, zone)`
   914  
   915  Returns a string representation of the given epoch_nano timestamp for the given Timezone (e.g. `"UTC"`, `"America/Los_Angeles"`, ...)
   916  The `layout` parameter should follow [Go's time layout format](https://pkg.go.dev/time#pkg-constants){: target="_blank" }.
   917  
   918  ### `time/format_iso(epoch_nano, zone)`
   919  
   920  Returns a string representation of the given `epoch_nano` timestamp for the given Timezone (e.g. `"UTC"`, `"America/Los_Angeles"`, ...)
   921  The returned string will be in [ISO8601](https://en.wikipedia.org/wiki/ISO_8601){: target="_blank" } format.
   922  
   923  ### `time/sleep(duration_ns)`
   924  
   925  Sleep for `duration_ns` nanoseconds
   926  
   927  ### `time/since(epoch_nano)`
   928  
   929  Returns the amount of nanoseconds elapsed since `epoch_nano`
   930  
   931  ### `time/add(epoch_time, duration_table)`
   932  
   933  Returns a new timestamp (in nanoseconds passed since 01/01/1970 00:00:00) for the given `duration`.
   934  The `duration` should be a table with the following structure:
   935  
   936  ```lua
   937  > require("time")
   938  > time.add(time.now(), {
   939      ["hour"] = 1,
   940      ["minute"] = 20,
   941      ["second"] = 50
   942  })
   943  ```
   944  You may omit any of the fields from the table, resulting in a default value of `0` for omitted fields
   945  
   946  ### `time/parse(layout, value)`
   947  
   948  Returns a `float64` representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00).
   949  This timestamp will represent date `value` parsed using the `layout` format.
   950  
   951  The `layout` parameter should follow [Go's time layout format](https://pkg.go.dev/time#pkg-constants){: target="_blank" }
   952  
   953  ### `time/parse_iso(value)`
   954  
   955  Returns a `float64` representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00 for `value`.
   956  The `value` string should be in [ISO8601](https://en.wikipedia.org/wiki/ISO_8601){: target="_blank" } format
   957  
   958  ### `uuid/new()`
   959  
   960  Returns a new 128-bit [RFC 4122 UUID](https://www.rfc-editor.org/rfc/rfc4122){: target="_blank" } in string representation.
   961  
   962  ### `net/url`
   963  
   964  Provides a `parse` function parse a URL string into parts, returns a table with the URL's host, path, scheme, query and fragment.
   965  
   966  ```lua
   967  > local url = require("net/url")
   968  > url.parse("https://example.com/path?p1=a#section")
   969  {
   970      ["host"] = "example.com"
   971      ["path"] = "/path"
   972      ["scheme"] = "https"
   973      ["query"] = "p1=a"
   974      ["fragment"] = "section"
   975  }
   976  ```
   977  
   978  
   979  ### `net/http` (optional)
   980  
   981  Provides a `request` function that performs an HTTP request.
   982  For security reasons, this package is not available by default as it enables http requests to be sent out from the lakeFS instance network. The feature should be enabled under `actions.lua.net_http_enabled` [configuration]({% link reference/configuration.md %}).
   983  Request will time out after 30 seconds.
   984  
   985  ```lua
   986  http.request(url [, body])
   987  http.request{
   988    url = string,
   989    [method = string,]
   990    [headers = header-table,]
   991    [body = string,]
   992  }
   993  ```
   994  
   995  Returns a code (number), body (string), headers (table) and status (string).
   996  
   997   - code - status code number
   998   - body - string with the response body
   999   - headers - table with the response request headers (key/value or table of values)
  1000   - status - status code text
  1001  
  1002  The first form of the call will perform GET requests or POST requests if the body parameter is passed.
  1003  
  1004  The second form accepts a table and allows you to customize the request method and headers.
  1005  
  1006  
  1007  Example of a GET request
  1008  
  1009  ```lua
  1010  local http = require("net/http")
  1011  local code, body = http.request("https://example.com")
  1012  if code == 200 then
  1013      print(body)
  1014  else
  1015      print("Failed to get example.com - status code: " .. code)
  1016  end
  1017  
  1018  ```
  1019  
  1020  Example of a POST request
  1021  
  1022  ```lua
  1023  local http = require("net/http")
  1024  local code, body = http.request{
  1025      url="https://httpbin.org/post",
  1026      method="POST",
  1027      body="custname=tester",
  1028      headers={["Content-Type"]="application/x-www-form-urlencoded"},
  1029  }
  1030  if code == 200 then
  1031      print(body)
  1032  else
  1033      print("Failed to post data - status code: " .. code)
  1034  end
  1035  ```