github.com/Racer159/jackal@v0.32.7-0.20240401174413-0bd2339e4f2e/adr/0018-hooks.md

github.com/Racer159/jackal@v0.32.7-0.20240401174413-0bd2339e4f2e/adr/0018-hooks.md (about)

     1  # 18. Jackal Hooks
     2  
     3  Date: 2023-09-20
     4  
     5  ## Status
     6  
     7  Accepted
     8  
     9  ## Context
    10  
    11  The idea of `hooks` is to provide a way for cluster maintainers to register functionality that runs during the deployment lifecycle. Jackal packages already have the concept of `actions` that can execute commands on the host machine's shell during certain package lifecycle events. As `actions` gain more adoption, the team has noticed they are being used to add functionality to Jackal in unexpected ways. We want `actions` to be a tool that extends upon the functionality of Jackal and its packages, not a tool that works around missing or clunky functionality.
    12  
    13  We want package creators to be able to create system agnostic packages by leveraging core Jackal functionality. The following is one such scenario:
    14  
    15  - _IF_ ECR is chosen as the external registry during `jackal init` / cluster creation, _THEN_ Jackal will seamlessly leverage ECR without requiring advanced user effort.
    16  
    17  Using ECR as a remote registry creates 2 problems that Jackal will need to solve:
    18  
    19   1. ECR authentication tokens expire after 12 hours and need to be refreshed. This means the cluster will need to constantly be refreshing its tokens and the user deploying packages will need to make sure they have a valid token.
    20   2. ECR Image Repositories do not support 'push-to-create'. This means we will need to explicitly create an image repository for every image that is being pushed within the Jackal package.
    21  
    22  Packages that get deployed onto a cluster initialized with ECR as its remote registry will need to make sure it solves these 2 problems.
    23  
    24  Currently there are 2 solutions:
    25  
    26  1. The package deployer solves the problem pre-deployment (creating needed repos, secrets, etc...)
    27  2. The package itself solves these problems with `actions` that are custom built for ECR clusters.
    28  
    29  Neither one of these current solutions are ideal. We don't want to require overly complex external + prior actions for Jackal package deployments, and we don't want package creators to have to create and distribute packages that are specific to ECR.
    30  
    31  Potential considerations:
    32  
    33  ### Internal Jackal Implementation
    34  
    35    Clusters that have hooks will have `jackal-hook-*` secret(s) in the 'jackal' namespace. This secret will contain the hook's configuration and any other required metadata. As part of the package deployment process, Jackal will check if the cluster has any hooks and run them if they exist. Given the scenario above, there is no longer a need for an ECR specific Jackal package to be created. An ECR hook would perform the proper configuration for any package deployed onto that cluster; thereby requiring no extra manual intervention from the package deployer.
    36  
    37    Jackal HookConfig state information struct:
    38  
    39    ```go
    40    type HookConfig struct {
    41      HookName     string                 `json:"hookName" jsonschema:"description=Name of the hook"`
    42      Internal     bool                   `json:"internal" jsonschema:"description=Internal hooks are run by Jackal itself, not by a plugin"`
    43      Lifecycle    HookLifecycle          `json:"lifecycle" jsonschema:"description=Lifecycle of the hook"`
    44      HookData     map[string]interface{} `json:"hookData" jsonschema:"description=Generic data map used for the hook. The data is obtained from a secret in the Jackal namespace"`
    45      OCIReference string                 `json:"ociReference" jsonschema:"description=Optional OCI reference to the hook image to run"`
    46    }
    47    ```
    48  
    49    Example Secret Data:
    50  
    51    ```yaml
    52    hookName: ecr-repository
    53    internal: true
    54    lifecycle: before-component
    55    hookData:
    56      registryURL: public.ecr.aws/abcdefg/jackal-ecr-registry
    57      region: us-east-1
    58      repositoryPrefix: ecr-jackal-registry
    59    ```
    60  
    61    For this solution, hooks have to be 'installed' onto a cluster before they are used. When Jackal is deploying a package onto a cluster, it will look for any secrets with the `jackal-hook` label in the `jackal` namespace.  If hooks are found, Jackal will run any 'package' level hooks before deploying a component and run any 'component' level hook for each component that is getting deployed. The hook lifecycle options will be:
    62  
    63    1. Before a package deployment
    64    2. After a package deployment
    65    3. Before a component deployment
    66    4. After a component deployment
    67  
    68    NOTE: The order of hook execution is nearly random. If there are multiple hooks for a lifecycle there is no guarantee that they will be executed in a certain order.
    69    NOTE: The `package` lifecycle might be changed to a `run-once` lifecycle. This would benefit packages that don't have kube context information when the deployment starts.
    70  
    71    Jackal hooks will have two forms of execution via `Internal` and `External` hooks:
    72  
    73    Internal Hooks:
    74  
    75    Internal hooks will be hooks that are built into the Jackal CLI and run internal code when executed. The logic for these hooks would be built into the Jackal CLI and would be updated with new releases of the CLI.
    76  
    77    External Hooks:
    78  
    79    There are a few approaches for external hooks:
    80  
    81    1. Have the hook metadata reference an OCI image that is downloaded and run.
    82  
    83       - The hook metadata can reference the shasum of the image to ensure the image is not tampered with.
    84       - We can pass metadata from the secret to the image.
    85  
    86    1. Have the hook metadata reference an image/endpoint that we call via a gRPC call.
    87       - This would require a lot of consideration to security since we will be executing code from an external source.
    88  
    89    1. Have the hook metadata contain a script or list of shell commands that can get run.
    90       - This would be the simplest solution but would require the most work from the hook creator. This also has the most potential security issues.
    91  
    92    Pros:
    93  
    94    - Implementing Hooks internally means we don't have to deal with any bootstrapping issues.
    95    - Internally managed hooks can leverage Jackal internal code.
    96  
    97    Cons:
    98  
    99    - Since 'Internal' hooks are built into the CLI, the only way to get updates for the hook is to  update the CLI.
   100    - External hooks will have a few security concerns that we will have to work through.
   101    - Implementing hooks internally adds more complexity to the Jackal CLI. This is especially true if we end up using WASM as the execution engine for hooks.
   102  
   103  ### Webhooks
   104  
   105    Webhooks, such as Pepr, can act as a K8s controller that enables Kubernetes mutations. We are (or will be) considering using Pepr to replace the `Jackal Agent`. Pepr is capable to accomplishing most of what Jackal wants to do with the concept of Hooks. Jackal hook configuration could be saved as secrets that Jackal will be able to use. As Jackal is deploying packages onto a cluster, it can check for secrets the represent hooks (as it would if hook execution is handled internally as stated above) and get information on how to run the webhook from the secret. This would likely mean that the secret that describes the hook would have a `URL` instead of an `OCIReference` as well as config information that it would pass through to the hook. With the webhook approach, lifecycle management is a lot more flexible as the webhook can operate on native kubernetes events such as a secret getting created / updated.
   106  
   107    Pros:
   108  
   109    - Pepr as a solution would be more flexible than the internal Jackal implementation of Hooks since the webhook could be anywhere.
   110    - Using Pepr would reduce the complexity of Jackal's codebase.
   111    - It will be easier to secure third party hooks when Pepr is the one running them.
   112    - Lifecycle management would be a lot easier with a webhook solution like Pepr.
   113  
   114    Cons:
   115  
   116    - Pepr is a new project that hasn't been stress tested in production yet (but neither has Hooks).
   117    - The Pepr image needs to be pushed to an image registry before it is deployed. This will require a new bootstrapping solution to solve the ECR problem we identified above.
   118  
   119  ## Decision
   120  
   121  [Pepr](https://github.com/defenseunicorns/pepr) will be used to enable custom, or environment-specific, automation tasks to be integrated in the Jackal package deployment lifecycle. Pepr also allows the Jackal codebase to remain agnostic to any third-party APIs or dependencies that may be used.
   122  
   123  A `--skip-webhooks` flag has been added to `jackal package deploy` to allow users to opt out of Jackal checking and waiting for any webhooks to complete during package deployments.
   124  
   125  ## Consequences
   126  
   127  While hooks don't introduce raw schema changes to Jackal, it does add complexity where side affects are happening during package deployments that might not be obvious to the package deployer. This is especially the case if the person who deployed the hooks is different from the person who is deploying the subsequent packages.