github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/quickstart/actions-and-hooks.md (about)

     1  ---
     2  title: 6️⃣ Using Actions and Hooks in lakeFS
     3  description: lakeFS quickstart / Use Actions and Hooks to enforce conditions when committing and merging changes
     4  parent: ⭐ Quickstart
     5  nav_order: 30
     6  next: ["Work with lakeFS data on your local environment", "./work-with-data-locally.html"]
     7  previous: ["Rollback the changes", "./rollback.html"]
     8  ---
     9  
    10  # Actions and Hooks in lakeFS
    11  
    12  When we interact with lakeFS it can be useful to have certain checks performed at stages along the way. Let's see how [actions in lakeFS]({% link howto/hooks/index.md %}) can be of benefit here. 
    13  
    14  We're going to enforce a rule that when a commit is made to any branch that begins with `etl`: 
    15  
    16  * the commit message must not be blank
    17  * there must be `job_name` and `version` metadata
    18  * the `version` must be numeric
    19  
    20  To do this we'll create an _action_. In lakeFS, an action specifies one or more events that will trigger it, and references one or more _hooks_ to run when triggered. Actions are YAML files written to lakeFS under the `_lakefs_actions/` folder of the lakeFS repository.
    21  
    22  _Hooks_ can be either a [Lua]({% link howto/hooks/lua.md %}) script that lakeFS will execute itself, an external [web hook]({% link howto/hooks/webhooks.md %}), or an [Airflow DAG]({% link howto/hooks/airflow.md %}). In this example, we're using a Lua hook.
    23  
    24  ## Configuring the Action
    25  
    26  1. In lakeFS create a new branch called `add_action`. You can do this through the UI or with `lakectl`: 
    27  
    28      ```bash
    29      docker exec lakefs \
    30          lakectl branch create \
    31                  lakefs://quickstart/add_action \
    32                          --source lakefs://quickstart/main
    33      ```
    34  
    35  2. Open up your favorite text editor (or emacs), and paste the following YAML: 
    36  
    37     ```yaml
    38     name: Check Commit Message and Metadata
    39     on:
    40       pre-commit:
    41         branches:
    42         - etl**
    43     hooks:
    44     - id: check_metadata
    45       type: lua
    46       properties:
    47         script: |
    48             commit_message=action.commit.message
    49             if commit_message and #commit_message>0 then
    50                 print("✅ The commit message exists and is not empty: " .. commit_message)
    51             else
    52                 error("\n\n❌ A commit message must be provided")
    53             end
    54     
    55             job_name=action.commit.metadata["job_name"]
    56             if job_name == nil then
    57                 error("\n❌ Commit metadata must include job_name")
    58             else
    59                 print("✅ Commit metadata includes job_name: " .. job_name)
    60             end
    61     
    62             version=action.commit.metadata["version"]
    63             if version == nil then
    64                 error("\n❌ Commit metadata must include version")
    65             else
    66                 print("✅ Commit metadata includes version: " .. version)
    67                 if tonumber(version) then
    68                     print("✅ Commit metadata version is numeric")
    69                 else
    70                     error("\n❌ Version metadata must be numeric: " .. version)
    71                 end
    72             end
    73     ```
    74  
    75  3. Save this file as `/tmp/check_commit_metadata.yml`
    76  
    77      * You can save it elsewhere, but make sure you change the path below when uploading
    78  
    79  4. Upload the `check_commit_metadata.yml` file to the `add_action` branch under `_lakefs_actions/`. As above, you can use the UI (make sure you select the correct branch when you do), or with `lakectl`:
    80  
    81      ```bash
    82      docker exec lakefs \
    83          lakectl fs upload \
    84              lakefs://quickstart/add_action/_lakefs_actions/check_commit_metadata.yml \
    85              --source /tmp/check_commit_metadata.yml
    86      ```
    87  
    88  5. Go to the **Uncommitted Changes** tab in the UI, and make sure that you see the new file in the path shown: 
    89  
    90      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-00.png" alt="lakeFS Uncommitted Changes view showing a file called `check_commit_metadata.yml` under the path `_lakefs_actions/`" class="quickstart"/>
    91  
    92      Click **Commit Changes** and enter a suitable message to commit this new file to the branch. 
    93  
    94  6. Now we'll merge this new branch into `main`. From the **Compare** tab in the UI compare the `main` branch with `add_action` and click **Merge**
    95  
    96      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-01.png" alt="lakeFS Compare view showing the difference between `main` and `add_action` branches" class="quickstart"/>
    97  
    98  ## Testing the Action
    99  
   100  Let's remind ourselves what the rules are that the action is going to enforce. 
   101  
   102  > When a commit is made to any branch that begins with `etl`: 
   103  
   104  > * the commit message must not be blank
   105  > * there must be `job_name` and `version` metadata
   106  > * the `version` must be numeric
   107  
   108  We'll start by creating a branch that's going to match the `etl` pattern, and then go ahead and commit a change and see how the action works. 
   109  
   110  1. Create a new branch (see above instructions on how to do this if necessary) called `etl_20230504`. Make sure you use `main` as the source branch. 
   111  
   112      In your new branch you should see the action that you created and merged above: 
   113  
   114      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-02.png" alt="lakeFS branch etl_20230504 with object /_lakefs_actions/check_commit_metadata.yml" class="quickstart"/>
   115  
   116  1. To simulate an ETL job we'll use the built-in DuckDB editor to run some SQL and write the result back to the lakeFS branch. 
   117  
   118      Open the `lakes.parquet` file on the `etl_20230504` branch from the **Objects** tab. Replace the SQL statement with the following: 
   119  
   120      ```sql
   121      COPY (
   122          WITH src AS (
   123              SELECT lake_name, country, depth_m,
   124                  RANK() OVER ( ORDER BY depth_m DESC) AS lake_rank
   125              FROM READ_PARQUET('lakefs://quickstart/etl_20230504/lakes.parquet'))
   126          SELECT * FROM SRC WHERE lake_rank <= 10
   127      ) TO 'lakefs://quickstart/etl_20230504/top10_lakes.parquet'    
   128      ```
   129  
   130  1. Head to the **Uncommitted Changes** tab in the UI and notice that there is now a file called `top10_lakes.parquet` waiting to be committed. 
   131  
   132      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-03.png" alt="lakeFS branch etl_20230504 with uncommitted file top10_lakes.parquet" class="quickstart"/>
   133  
   134      Now we're ready to start trying out the commit rules, and seeing what happens if we violate them.
   135      
   136  1. Click on **Commit Changes**, leave the _Commit message_ blank, and click **Commit Changes** to confirm. 
   137  
   138      Note that the commit fails because the hook did not succeed
   139      
   140      `pre-commit hook aborted`
   141      
   142      with the output from the hook's code displayed
   143  
   144      `❌ A commit message must be provided`
   145  
   146      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-04.png" alt="lakeFS blocking an attempt to commit with no commit message" class="quickstart"/>
   147  
   148  1. Do the same as the previous step, but provide a message this time: 
   149  
   150      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-05.png" alt="A commit to lakeFS with commit message in place" class="quickstart"/>
   151  
   152      The commit still fails as we need to include metadata too, which is what the error tells us
   153  
   154      `❌ Commit metadata must include job_name`
   155  
   156  1. Repeat the **Commit Changes** dialog and use the **Add Metadata field** to add the required metadata: 
   157  
   158      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-06.png" alt="A commit to lakeFS with commit message and metadata in place" class="quickstart"/>
   159  
   160      We're almost there, but this still fails (as it should), since the version is not entirely numeric but includes `v` and `ß`: 
   161  
   162      `❌ Version metadata must be numeric: v1.00ß`
   163  
   164      Repeat the commit attempt specify the version as `1.00` this time, and rejoice as the commit succeeds
   165  
   166      <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-07.png" alt="Commit history in lakeFS showing that the commit met the rules set by the action and completed successfully." class="quickstart"/>
   167  
   168  ---
   169  
   170  You can view the history of all action runs from the **Action** tab: 
   171  
   172  <img width="75%" src="{{ site.baseurl }}/assets/img/quickstart/hooks-08.png" alt="Action run history in lakeFS" class="quickstart"/>
   173