github.com/filecoin-project/bacalhau@v0.3.23-0.20230228154132-45c989550ace/ops/aws/canary/README.md (about)

     1  # Bacalhau Monitoring Canary
     2  This is a canary that continuously call several Bacalhau APIs and alarm whenever the correctness or availability of those APIs fall below a threshold.
     3  
     4  The canary is serverless using AWS Lambda. Infrastructure is defined using AWS CDK, and automatically deployed using AWS CodePipeline.
     5  
     6  ## Quick LInks
     7  - [Public Dashboard](https://cloudwatch.amazonaws.com/dashboard.html?dashboard=BacalhauCanaryProd&context=eyJSIjoidXMtZWFzdC0xIiwiRCI6ImN3LWRiLTI4NDMwNTcxNzgzNSIsIlUiOiJ1cy1lYXN0LTFfUTlPMEVrM3llIiwiQyI6IjExc3NlYW1tZmVmaGdtYTFzMDk1c29jaDltIiwiSSI6InVzLWVhc3QtMTpmNGE5MGFiMi0yZWYwLTRlYTEtOWZkNS1jMmQ3MDkxYTA5OTQiLCJNIjoiUHVibGljIn0=)
     8  - [AWS Account Sign-in](https://284305717835.signin.aws.amazon.com/console/?region=eu-west-1)
     9  - [Canary Prod Logs](https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logsV2:log-groups)
    10  - [Canary Lambda Functions](https://eu-west-1.console.aws.amazon.com/lambda/home?region=eu-west-1#/functions?fo=and&o0=%3A&v0=BacalhauCanary)
    11  - [Deployment Pipeline](https://console.aws.amazon.com/codesuite/codepipeline/pipelines/BacalhauCanaryPipeline-PipelineC660917D-I0DZJY6IFHTO/view?region=eu-west-1)
    12  
    13  ## Canary Scenarios
    14  The canary is composed of several scenarios, each is executed periodically on its own lambda function. The scenarios are defined in the `lambda/pkg/scenarios` directory, and include:
    15  - `list`: Call Bacalhau's list API and verify the response.
    16  - `submit`: Submits a job to Bacalhau and verify it was successfully completed
    17  - `submitAndDescribe`: Submits a job to Bacalhau, waits for it to complete, and then calls the describe related APIs.
    18  - `submitAndGet`: Submits a job to Bacalhau, waits for it to complete, and then download the output and verify its correctness.
    19  - `submitDockerIPFSJobAndGet`: Submits a job to Bacalhau with an IPFS input, waits for it to complete, and then download the output and verify its correctness.
    20  - `submitWithConcurrency`: Submits a job to Bacalhau with a concurrency of 3, and waits for it to complete.
    21  - `submitWithConcurrencyOwnedNodes`: Submits a job to Bacalhau owned nodes with a concurrency of 3, and waits for it to complete.
    22  
    23  ### Local Testing
    24  You can run the scenarios locally before deploying to lambda by using the following command:
    25  ```bash
    26  # Assuming you are in the ops/aws/canary directory
    27  go run ./lambda/cmd/scenario_local_runner --action list # or any other scenario
    28  
    29  # If you get a `no packages loaded from` error just cd into the /ops/aws/canary/lambda/cmd/scenario_local_runner directory
    30  go run . --action list
    31  ```
    32  
    33  ## Releasing a New Version
    34  Follow these steps when a new version of Bacalhau is released and deployed to prod so that the canary client is also updated to a compatible version and deployed:
    35  1. Update the `go.mod` in the [ops/aws/canary/lambda directory](ops/aws/canary/lambda/go.mod) to point to the new version of Bacalhau.
    36  2. Run `go mod tidy` to update the `go.sum` file by running `(cd ops/aws/canary/lambda && go mod tidy)`
    37  3. Update any breaking changes in Bacalhau client API.
    38  4. Verify the canary is compiling locally by running `(cd ops/aws/canary/lambda &&  go build -o /dev/null ./cmd/scenario_lambda_runner)`
    39  5. Push the changes to main, and the canary pipeline will automatically deploy the new version.
    40  
    41  This is a [sample commit](https://github.com/filecoin-project/bacalhau/commit/958630dbe4ad9ba35b0715be2f82c66c60797ba4) updating the canary to Bacalhau v0.2.6
    42  
    43  ## Infrastructure Stacks
    44  There are two types of stacks in this project:
    45  - Canary stack(s): one stack per environment (e.g. prod, dev), containing the Lambda function and the CloudWatch alarm.
    46  - Pipeline stack: contains the CodePipeline and CodeBuild resources.
    47  
    48  ### Deploying Canary Stacks Changes
    49  Changes to the canary stacks are automatically deployed as soon a new commit is pushed to the main branch. You *should not* deploy this stack manually.
    50  
    51  **Note:** Currently only the prod stack is deployed.
    52  
    53  ### Deploying Pipeline Stack Changes
    54  Changes to the pipeline such as adding a new stage or modifying the build scripts needs to be deployed manually. To do so, run the following command:
    55  ```bash
    56  # Assuming you have the AWS CLI installed and configured with a profile named "bacalhau"
    57  # Assuming you are in the ops/aws/canary directory
    58  cdk --profile bacalhau deploy BacalhauCanaryPipeline -c config=prod
    59  ```
    60  Note that we only have a single pipeline stack deployed using prod environment configuration, but it will deploy all canary stacks.
    61  
    62  ### Manual Resources
    63  These are the resources that had to be created/updated manually outside of CDK:
    64  1. GitHub Connection
    65  2. CloudWatch public dashboard link
    66  3. Update secret manager with Slack webhook URL
    67  
    68  
    69  ## Useful CDK commands
    70  Keep in mind that you might need to pass your AWS profile and the stack name in some of these commands:
    71  * `npm run build`   compile typescript to js
    72  * `npm run postinstall` deletes cdk golang templates that can result in breaking go commands due to invalid file naming pattern
    73  * `npm run watch`   watch for changes and compile
    74  * `npm run test`    perform the jest unit tests
    75  * `cdk deploy`      deploy this stack to your default AWS account/region
    76  * `cdk diff`        compare deployed stack with current state
    77  * `cdk synth`       emits the synthesized CloudFormation template