github.com/yogeshkumararora/slsa-github-generator@v1.10.1-0.20240520161934-11278bd5afb4/SPECIFICATIONS.md (about) 1 # Technical Design 2 3 --- 4 5 <!-- markdown-toc --bullets="-" -i SPECIFICATIONS.md --> 6 7 <!-- toc --> 8 9 - [Problem statement](#problem-statement) 10 - [Threat model](#threat-model) 11 - [Trusted builder and provenance generator](#trusted-builder-and-provenance-generator) 12 - [Interference between jobs](#interference-between-jobs) 13 - [Workflow identity using OIDC and keyless signing](#workflow-identity-using-oidc-and-keyless-signing) 14 - [Example workflow for Go](#example-workflow-for-go) 15 - [Provenance Verification](#provenance-verification) 16 - [Detailed Steps](#detailed-steps) 17 - [Verification Latency](#verification-latency) 18 - [Threats covered](#threats-covered) 19 - [SLSA4 requirements](#slsa4-requirements) 20 - [Build-level provenance](#build-level-provenance) 21 - [Source-level provenance](#source-level-provenance) 22 23 <!-- tocstop --> 24 25 --- 26 27 ## Problem statement 28 29 There are a large number of projects that are "GitHub native", in the sense they are developed, reviewed and released entirely on GitHub. Developers of such projects may release binaries or publish packages on registries (e.g., npm, Python). Those users do not want to pay for external cloud services to build/package their software, and want to keep using the tooling they are accustomed to on GitHub. 30 31 On GitHub, actions such as [go-releaser](https://github.com/goreleaser/goreleaser-action), [docker-releaser](http://github.com/docker/build-push-action), [pypi-publish](http://github.com/pypa/gh-action-pypi-publish) are a standard way used to release artifacts for go, docker and Python packages respectively (other package managers have similar actions). 32 33 In this proposal, we propose a flow to achieve non-forgeable (build and source) [provenance](https://slsa.dev/provenance/v0.2) using GitHub's standard workflows with a trusted action: 34 35 - The process is compatible with existing release processes, and we provide a layer to build provenance using actions. In the future the provenance generation can be incorporated into the standard actions that developers already use; or they could switch to using our action instead. 36 - The code in this repository demonstrates that provenance generation is achievable using existing tooling provided by the open-source community, and provide a prototype implementation for the Go ecosystem that includes a guaranteed isolated and ephemeral build. 37 38 An additional constraint we impose is that users should not need to manage cryptographic keys because they are hard to discover, keep safe, revoke and manage. In GitHub workflows, this is particularly important because [encrypted secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets) are accessible to all repo maintainers with push access, regardless of the branch protection settings. 39 40 The scope of the current design is limited to generating build provenance that satisfies SLSA 3 requirements. Extending to SLSA 4 would require a similar design to generate non-forgeable attestations on source repository settings (e.g. code review), hermeticity, and reproducibility. The latter two build requirements may be ecosystem dependent, but if the ecosystem tooling supports it, then attestations on these can be generated with this design. Source level settings could use Scorecards, Allstar, or other similar tools. 41 42 ## Threat model 43 44 Non-forgeable provenance requires trust in the builder, the provenance generator (reusable workflow), the provenance verifier, and the platform they run on (GitHub): those are part of the TCB. 45 46 | Component | Requires trust for | 47 | ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 48 | **GitHub** | - Executing expected code in workflows<br>- Integrity of data passed between job VMs<br>- Isolation of defaults, environment variables between caller workflows and reusable workflows<br>- Isolation between jobs of a workflow<br>- OIDC token issuance | 49 | **SigStore** | - Ephemerality of signing key<br>- Fulcio authentication for signing certificate | 50 | **Generator workflow/Verifiers**<br>(the trusted reusable workflow) | - Generating correct contents of the provenance<br>- Build process isolation<br>- Correct verification of the signatures and provenance | 51 52 We do not trust the users (project maintainers) of the builders. Even if they are malicious, they cannot tamper with the provenance. The content of the source code is out of scope: users may manipulate the repository’s code, including the environment variables declared in the build configuration files in the source, but they cannot produce incorrect provenance. The provenance will still be valid and non-forgeable; it also contains the source repository reference where that code is defined. 53 54 ## Trusted builder and provenance generator 55 56 Non-forgeable provenance requires a trusted builder and a trusted provenance generator that are isolated from one another and from maintainer's interference. These are often referred to as "trusted builders" in the SLSA nomenclature. There is a direct mapping between the expected isolation we need and GitHub runners. 57 58 According to [GitHub's official documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners), "each job in a workflow executes in a fresh instance of the virtual machine." Data can be [passed](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts#passing-data-between-jobs-in-a-workflow) between jobs using [GitHub's artifact registry within the workflow](https://github.com/actions/upload-artifact). In other words, the workflow is like the orchestrator that "glues" jobs together. We propose using a job as the isolation mechanism for our trusted builder and our trusted provenance generator. 59 60 A GitHub workflow may be run either on [GitHub-hosted runners](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners) or on [self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners). In the case of a self-hosted runner, a verifier has no guarantee that the code run is in fact the intended workflow unless they also trust the self-hosted runner. GitHub-hosted runners give us this guarantee, so long as we trust GitHub. In this document, we make the assumption that we trust GitHub to run the exact code defined in the workflow. 61 62 Below is an example workflow depicting job definitions: 63 64 ```yaml 65 name: A workflow 66 on: 67 workflow_dispatch: 68 69 jobs: 70 vm1: // Isolated job called "vm1" 71 runs-on: ubuntu-latest 72 run: "echo hello world" 73 ... 74 vm2: // Isolated job called "vm2" 75 uses: some/action 76 vm3: // Isolated job called "vm3" 77 uses: another/action 78 ``` 79 80 An example of the output provenance can be found the [README.md#example-provenance](README.md#example-provenance). 81 82 ### Interference between jobs 83 84 Project maintainers are in charge of defining the workflows that release the build, so they could, in principle, try to define the workflow in a way that interferes with the builder. This would allow them to alter the provenance information. For example, [environment variables](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#env), [steps](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idsteps), [services](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idservices) and [defaults](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#defaults), to name a few, are propagated to jobs defined in the workflow. 85 86 To avoid this problem, we use a special type of GitHub "action" called a [reusable workflow](https://docs.github.com/en/actions/using-workflows/reusing-workflows): they have many desirable properties, in that they avoid all the possible sources of interference listed above. The only way to interact with a reusable workflow is through the [input parameters](<(https://docs.github.com/en/actions/using-workflows/reusing-workflows#supported-keywords-for-jobs-that-call-a-reusable-workflow)>) it exposes to the calling workflow. 87 88 Below is an example of a reusable workflow called from an untrusted "caller workflow": 89 90 ```yaml 91 name: caller workflow 92 on: 93 workflow_dispatch: 94 95 env: 96 SOME_VAR: var-value 97 98 jobs: 99 vm1: ... 100 101 vm2: // Isolated job called "vm2" calling a trusted reusable workflow 102 uses: some/repo/.github./workflow/trusted-builder-reusable-workflow.yml@v1 103 // no other steps or actions can be used 104 // no env variables can be declared 105 ``` 106 107 Below is an example of reusable workflow definition: 108 109 ```yaml 110 // github.com/some/repo/.github./workflow/re-usable-workflow.yml 111 name: reusable workflow (trusted builder) 112 on: workflow_call 113 114 jobs: 115 build: // Isolated job building the project 116 ... 117 provenance: // Isolated job building and signing the provenance 118 ``` 119 120 A reusable workflow itself can contain multiple jobs: so we can define a trusted builder that itself uses different VMs to 1) compile the project and 2) generate the SLSA provenance - both using (trusted) GitHub-hosted runners. We still need to pass data around between jobs via [GitHub artifacts](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts). All jobs in a workflow can upload artifacts and possibly tamper with those used by the trusted builder. So we protect their integrity via hashes. We can safely exchange hashes between the jobs because there's a [trusted channel](https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs) to pass them between jobs of the same workflow using namespaces that identify the exact job that generated it. (Of course, we could use this mechanism for exchanging the resulting binaries, but we don't do that because there are size limitations to this special trusted channel). 121 122 ```text 123 ┌──────────────────────┐ ┌───────────────────────────────┐ 124 │ │ │ │ 125 │ Source repository │ │ Trusted builder │ 126 │ ----------------- │ │ (reusable workflow) │ 127 │ │ │ ------------------- │ 128 │ .slsa-goreleaser.yml │ │ │ 129 │ ├─────────┼─────────────┐ │ 130 │ │ │ │ │ 131 │ │ │ ┌─────────▼─────────────┐ │ 132 │ User workflow │ │ │ Build │ │ 133 │ │ │ └───────────────────────┘ │ 134 └──────────────────────┘ | | │ 135 │ ┌─────────▼─────────────┐ │ 136 │ │ Generate provenance │ │ 137 │ └─────────┬─────────────┘ │ 138 │ │ │ 139 └─────────────┼─────────────────┘ 140 │ 141 │ 142 ┌─────────────▼─────────────────┐ 143 │ │ 144 │ binary signed provenance │ 145 │ │ 146 │ │ 147 │ Artifacts │ 148 | --------- | 149 └───────────────────────────────┘ 150 ``` 151 152 ## Workflow identity using OIDC and keyless signing 153 154 In the previous sections, we established that it is possible to build trusted builders using the isolation provided by GitHub's jobs. The last piece of the puzzle is to identify the builder during provenance verification. 155 156 Our solution leverages the workflow identity of the GitHub runner, as follows. 157 158 OpenID Connect (OIDC) is a standard used across the web. It lets an identity provider (e.g., Google) attest to the identity of a user for a third party. GitHub recently added [support for OIDC](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect) for their workflows (an example can be found in [here](https://github.com/naveensrinivasan/stunning-tribble/blob/main/.github/workflows/docker-sign.yml)). The OIDC protocol is particularly interesting because it requires _no hardcoded, long-term secrets be stored in GitHub's secrets_. 159 160 Each time a workflow job is triggered on GitHub, GitHub provisions the workflow with a unique bearer token (ACTIONS_ID_TOKEN_REQUEST_TOKEN) that can be exchanged for a [JWT token](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#understanding-the-oidc-token) which contains the caller repository name, commit hash, and trigger, and the current (reuseable) workflow path and reference. Using OIDC, the workflow can prove its identity to an external identity, which will be Fulcio CA in our case. After verifying the OAuth token issued by the GitHub job, Fulcio issues a signing certificate attesting to an ephemeral signing public key and [tying it to the reusable workflow identity](https://github.com/sigstore/fulcio/blob/main/docs/oidc.md) (with extension fields for [calling repo name, hash commit, trigger events, and branch name](https://github.com/sigstore/fulcio/blob/c74e2cfb763dd32def5dc921ff49f579fa262d96/docs/oid-info.md)). 161 162 By signing the "provenance" using a Fulcio-authenticated signing key and using GitHub-hosted runners, we build a mechanism to verifiably attest which code is run (defined by a workflow, hash commits and trigger): the hash commit uniquely identify the content of the workflow run. The trusted provenance generator signs the provenance using the Fulcio-authenticated signing key. A third party can then use this attestation as a trust anchor to prove that the trusted builder created the attestation and binary from the calling repository. 163 164 ### Example workflow for Go 165 166 The workflow is like any standard ones maintainers use today. See example below: 167 168 ```yaml 169 name: Release my code 170 on: 171 workflow_dispatch: 172 push: 173 tags: 174 - "*" 175 176 permissions: read-all 177 178 jobs: 179 build: 180 permissions: 181 id-token: write 182 contents: read 183 needs: args 184 uses: yogeshkumararora/slsa-github-generator-go/.github/workflows/builder.yml@<somehash> 185 with: 186 go-version: 1.17 187 188 # Maintainer can do whatever they want with the results 189 # Below we upload as assets to a GitHub release. 190 upload: 191 permissions: 192 contents: write 193 runs-on: ubuntu-latest 194 needs: build 195 steps: 196 - uses: actions/download-artifact@c850b930e6ba138125429b7e5c93fc707a7f8427 # v4.1.4 197 with: 198 name: ${{ needs.build.outputs.go-binary-name }} 199 - uses: actions/download-artifact@c850b930e6ba138125429b7e5c93fc707a7f8427 # v4.1.4 200 with: 201 name: ${{ needs.build.outputs.go-binary-name }}.intoto.jsonl 202 - name: Release 203 uses: softprops/action-gh-release@69320dbe05506a9a39fc8ae11030b214ec2d1f87 # v2.0.5 204 if: startsWith(github.ref, 'refs/tags/') 205 with: 206 files: | 207 ${{ needs.build.outputs.go-binary-name }} 208 ${{ needs.build.outputs.go-binary-name }}.intoto.jsonl 209 ``` 210 211 An example of the output provenance can be found in [README.md#example-provenance](README.md#example-provenance). 212 213 ### Provenance Verification 214 215 Given an artifact and a signed provenance, a consumer must verify the authenticity, integrity, proof of service-generation, and non-forgeability of the provenance in order to make accurate risk based assessments based on their security posture. 216 217 Authenticity and integrity come from the digital signature on the provenance that was created using a private key accessible only to the service generating the provenance. The ephemeral key is generated and stored inside the isolated builder VM. 218 219 Moreover, the provenance is non-forgeable. We first verify builder identity: by verifying the signing certificate against the Fulcio root CA, we can trust the certificate contents were populated correctly according to the OIDC token Fulcio received. The subject URI identifies the `job_workflow_ref` inside the provisioned OIDC token; this is used to identify that the trusted builder (the reusable workflow) attests to the provenance. 220 221 Because the signing key in the certificate and the OIDC token are only accessible inside this workflow, we have high confidence that the provenance was generated inside the service and that no other process could have impersonated the trusted builder. The ephemeral signing key is generated inside the workflow and does not get written to logs or leave the process. Moreover, even if the signing key was compromised, any signatures generated after the lifetime of the certificate (10 min) would be invalid, unless the attacker could retrieve a valid GitHub provisioned OIDC token for the trusted builder. Thus, the signing key is protected by the TTL of the certificate and its ephemerality. Further improvements on the scope of the signing certificate are discussed [here](https://github.com/sigstore/fulcio/issues/475). 222 223 Non-forgeability also requires user isolation: users cannot interfere with the process inside the trusted builder by the isolation of reusable workflows on GitHub-hosted runners (assuming trust in GitHub). The user-defined build process is also isolated from the provenance signing key by job isolation. 224 225 Note that we rely on GitHub hosted runners executing the defined code to trust that the provenance was correctly generated inside the builder and that no other process could impersonate the builder. 226 227 ### Detailed Steps 228 229 Given an artifact and a signed provenance, we perform the following steps: 230 231 1. **Download the signing certificate from the Rekor log**: Search the Rekor log by artifact hash to find the entry containing the signed provenance and extract the signing certificate. (See Rekor Log RT in Verification Latency for how this could be skipped). 232 233 2. **Verify the signed provenance**: Verify the signature in the DSSE payload using the signing certificate, and the chain of the signing certificate up to the Fulcio root CA. This verifies non-forgeability of the payload and establishes trust in the contents of the certificate. 234 235 3. **Extract the builder identity from the signing certificate**: Extract certificate information (see [here](https://github.com/sigstore/fulcio/blob/c74e2cfb763dd32def5dc921ff49f579fa262d96/docs/oid-info.md#136141572641--fulcio) for extension OIDs). Verify that the signing certificate’s subject name (job_workflow_ref) is the trusted builder ID at a trusted hash (calling repository SHA in the diagram below). This verifies authenticity of the provenance and guarantees the provenance was correctly populated. 236 237 <img src="images/cert.svg" width="70%" height="70%"> 238 239 4. **Verify the provenance attestation against a policy, as usual**: Parse the authenticated provenance and match the subject digest inside the provenance with the artifact digest. Additionally verify builder ID, configSource, and other properties according to policy. 240 241 A consumer performing these steps has the guarantee that the binary they consume was produced in the trusted builder at a given commit hash attested to in the provenance. 242 243 The provenance verification demo code is hosted [here](https://github.com/slsa-framework/slsa-verifier). An example output shows that we can retrieve the caller repository, trigger and reference where the artifact was built that consumers may use: 244 245 ```shell 246 $ go run main.go --binary ~/Downloads/binary-linux-amd64 --provenance ~/Downloads/binary-linux-amd64.intoto.jsonl --source github.com/asraa/slsa-on-github-test 247 Verified against tlog entry 1544571 248 verified SLSA provenance produced at 249 { 250 "caller": "asraa/slsa-on-github-test", 251 "commit": "0dfcd24824432c4ce587f79c918eef8fc2c44d7b", 252 "job_workflow_ref": "/yogeshkumararora/slsa-github-generator-go/.github/workflows/builder.yml@refs/heads/main", 253 "trigger": "workflow_dispatch", 254 "issuer": "https://token.actions.githubusercontent.com" 255 } 256 successfully verified SLSA provenance 257 ``` 258 259 ### Verification Latency 260 261 Verification, as described in the previous section, requires a network call to Rekor to discover the signing certificate. 262 263 Cosign needs to query the Rekor log to fetch the signing certificate and verify that the timestamps are valid. We believe this is not a show stopper and there are ways to mitigate this problem: 264 265 Cosign supports an experimental feature [bundle](https://github.com/sigstore/cosign/blob/main/USAGE.md#verify-a-signature-was-added-to-the-transparency-log) that does not require querying the Rekor logs. In this scenario, the Rekor log is not queried. Instead, the Rekor log signs a "promise" to add an entry to the log. This requires trusting Rekor more, but is similar to how the web PKI works in practice - Rekor serves as the CT log and the signed promise is the SCT. This would require uploading the bundle payload which includes the signing certificate (see cosign [specification](https://github.com/sigstore/cosign/blob/617bc78899022a6ff266dbc095ba931d2f8786c1/specs/SIGNATURE_SPEC.md#properties) for the format) alongside the binary and signed provenance. The bundle may also be incorporated into the DSSE payload itself (see this [issue](https://github.com/secure-systems-lab/dsse/issues/42) for certificate inclusion and custom field options). 266 267 ## Threats covered 268 269 More specifically, below are a list of threats we aim to protect against: 270 271 | Threat | Builder | Verifier | 272 | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | 273 | Build code from a different repo | Sigstore embeds repo name in cert | Verify cert and provenance should match | 274 | Build same repo different hash | Sigstore embeds hash in the cert | Verify cert and provenance should match | 275 | Build same repo different branch | Sigstore embeds OIDC token's `ref` in the cert | Verify cert and provenance should match | 276 | Build same repo different version | Tag is added to provenance. (Note: can be added to cert since info is [available in OIDC token](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#understanding-the-oidc-token)) | Verify provenance info | 277 | Build same repo same version but non-default branch | Branch and versions both added to provenance using GitHub's trigger payload | Verify provenance info | 278 | Build same repo different builder | Sigstore embeds trusted builder's path in cert | Verify cert's workflow path | 279 | Build same repo using user-defined workflow | Sigstore embeds builder's path in cert | Verify cert's workflow path | 280 | Forge valid certificate with different repo/hash/builder through GitHub token leak | Token expires when job is complete, cleared after unmarshalling | 281 | Malicious env variables | Only accepts `CGO_*` and `GO*` env variables | Note: should be left to the verifier to decide | 282 | Script injections | Filter option names using allow-list + use execve() | Note: should be left to the verifier to decide | 283 | Malicious compiler options | Use allow-list | Note: should be left to the verifier to decide | 284 285 ## SLSA4 requirements 286 287 Here we explain how SLSA requirements can be achieved: 288 289 ### Build-level provenance 290 291 | Requirements | Fulfilled | 292 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | 293 | Hermetic | Yes. In general, we can set up IP table rules at the start of the VM (even remove sudo access if needed). In practice, hermiticity depends on support from the compilation/packaging toolchain. The toolchain needs to support distinct steps to download the dependencies and to compile. Otherwise we can never truly achieve hermeticity. In golang, it's easy to achieve using go mod vendor to download dependencies, and go build -mod-vendor to build the project. For Python and npm, pre-compilation scripts can be run so we need support from the tooling to separate these steps from the compilation steps. | 294 | Parameterless | Yes, by the nature of the workflow. Note: golang accepts dynamic parameters like ldflags to pass variables to the linker. These flags often need to run scripts to be generated. An example is to generate the hash commit or version of a project so that it can be displayed by the final binary. In this case, it requires running git command to set the ldflags. | 295 | Isolated | Yes, by nature of GitHub jobs | 296 | Ephemeral | Yes, by nature of GitHub jobs | 297 | Scripted build | Yes, thru workflow | 298 | Build service | Yes, on GitHub | 299 | Build as code | Yes, thru workflow | 300 301 ### Source-level provenance 302 303 Review provenance can be added as an additional isolated job within the reusable workflow. We can add review information for all commits since the last release, for example. As of December 2023, source-level requirements are being worked on by the SLSA WG: refer to [slsa-framework/slsa/issues/956](https://github.com/slsa-framework/slsa/issues/956) for additional information.