Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Provenance is Unforgeable requirements at Build L3 #986

Open
marcelamelara opened this issue Oct 12, 2023 · 8 comments
Open

Clarify Provenance is Unforgeable requirements at Build L3 #986

marcelamelara opened this issue Oct 12, 2023 · 8 comments

Comments

@marcelamelara
Copy link
Contributor

marcelamelara commented Oct 12, 2023

The current Provenance is Unforgeable requirements at Build L3 include: "Every field in the provenance MUST be generated or verified by the build platform in a trusted control plane. The user-controlled build steps MUST NOT be able to inject or alter the contents, except as noted in Provenance is Authentic."

I'm looking for clarification on two fronts. Is there a restriction on the L2 exceptions to accuracy at L3? If not, does the real gain for accuracy at L3 come from requiring the signing keys for the Provenance to reside outside a tenant-controlled component? In this case, I'm not sure what the "user-controlled build steps MUST NOT be able to inject/alter the contents" requirement adds in terms of accuracy properties at L3 if a tenant is still able to modify the Provenance before it's signed by the control plane.

This leads me to my second question. If my interpretation of "provenance MUST be generated/verified by the build platform in a trusted control plane" is correct, any Provenance generated and signed by a GHA workflow (a tenant-controlled component such as the generic slsa-github-generator), should not qualify for Build L3. Is this interpretation accurate? Coming back to my first point, is the main intent of the accuracy requirements at L3 to ensure signing keys are managed by the trusted control plane?

Would appreciate hearing folks' thoughts on this, and I'd also be happy to open a PR with wording changes to reflect clarifications to the spec, if others agree that changes are warranted.

@behnazh-w
Copy link

behnazh-w commented Oct 12, 2023

@marcelamelara I have opened a related issue and there is already a PR to adjust the wording.

@joshuagl
Copy link
Member

Great questions, thanks @marcelamelara!

At Build L3 the tenant should only be able to provide the subject (see threat "Forge output digest of the provenance" for why this is OK), all other information in the provenance should come from the trusted control plane.

This means that:

  • At Build L3 there should be a complete record of externalParameters which is generated, or verified, by the control plane. At Build L2 this may be incomplete.
  • At Build L3 resolvedDependencies may be incomplete but MUST be generated or verified by the control plane. At Build L2 resolvedDependencies may be incomplete and may be tenant generated. This is, admittedly, unclear in the current wording of the spec and may not match everyone's interpretation.

AIUI there are multiple factors which make provenance from slsa-github-generators unforgeable, copying @ianlewis and @laurentsimon to help correct/clarify:

  1. the generators are reusable workflows, this means the tenant does not influence the logic in the generator beyond choosing to use the generator in their workflow and passing configuration inputs to the generator
  2. much of the provenance data recorded by the generators is simply persisting data from the GitHub control plane (i.e., the values of environment variables set by GitHub's control plane)
  3. for Build L3 builders, the dependencies are resolved in an isolated job/VM

@marcelamelara
Copy link
Contributor Author

marcelamelara commented Oct 12, 2023

Thanks @behnazh-w and @joshuagl ! I will go take a look at the linked issue and PR.

AIUI there are multiple factors which make provenance from slsa-github-generators unforgeable, copying @ianlewis and @laurentsimon to help correct/clarify:

  1. the generators are reusable workflows, this means the tenant does not influence the logic in the generator beyond choosing to use the generator in their workflow and passing configuration inputs to the generator

This first point, I think, is really the one that I was getting at, and not quite certain how the control plane/build platform ensures that the tenant can't influence the logic of a reusable workflow since tenants technically do have unfettered access to anything running within the VM. The second and third points are in line with my understanding as well, so it's possible I am just missing some context around reusable workflows.

CC @chkimes

@MarkLodato
Copy link
Member

Related: #975 (comment). I think L2 vs L3 is quite murky, and I think it's worth considering moving some requirements from L3 to L2 to have a more crisp definition of L2.

This first point, I think, is really the one that I was getting at, and not quite certain how the control plane/build platform ensures that the tenant can't influence the logic of a reusable workflow since tenants technically do have unfettered access to anything running within the VM.

A reusable workflow runs as a separate VM instance from the caller. There is documentation at https://github.com/slsa-framework/slsa-github-generator/blob/main/SPECIFICATIONS.md, but in short, the only influence that the caller has is the input parameters. Does that help?

@marcelamelara
Copy link
Contributor Author

marcelamelara commented Oct 13, 2023

the only influence that the caller has is the input parameters. Does that help?

This does help, thanks. If the main goal of L3 (as you mention in #975 (comment)) is to ensure the tenant cannot tamper with the provenance generation during the build, it seems like we should be able to relax the requirement that the build platform itself must perform the generation. In fact, the actual level definition for L3 aligns with this, but the Unforgeable description (as currently written) still requires that provenance to be generated and/or verified by the platform-managed control plane. So I wonder if ensuring that the requirements.md page is better aligned with levels.md would help in addressing my comments. Happy to open a PR for this, since #948 currently only provides changes to levels.md, if we want to keep these separate, though there seem to be some recommendations for edits to requirements.md on that PR as well.

Separately, though, I would still like to better understand how a reusable workflow (or equivalent process) that generates provenance and runs outside of the control plane (and thus the trust boundary in the model) isn't susceptible to the threats targeted by L3. In essence, even if the caller can't influence the separate workflow in the reusable workflows scenario, trust has been shifted away from the control plane/build platform to a separate SW artifact. In some ways, this also seems like a deviation from some of the SLSA guiding principles. What I think would be helpful is for the documentation on the Getting Started page might be to describe how specific features of the tools recommended on that page meet the SLSA requirements. Again, happy to propose some possible changes here if others think that would be helpful.

@ianlewis
Copy link
Member

If the main goal of L3 (as you mention in #975 (comment)) is to ensure the tenant cannot tamper with the provenance generation during the build, it seems like we should be able to relax the requirement that the build platform itself must perform the generation

I think, at least for slsa-github-generator, we are defining the "builder" or "build platform" as the combination of reusable workflow + GitHub Actions (rather than just GitHub Actions alone). It's for this reason that we provide our own builder.id in the provenance rather than one that just indicates it's run on GHA. It does mean that you have to trust the maintainers of slsa-github-generator as well as GitHub Actions when verifying the provenance but slsa-verifier allows you to specify the expected --builder-id at verification time (example for the Node.js workflow)

Technically speaking a CI platform (e.g. GitHub Actions) that provide build primitives (e.g. reusable workflows) that have properties that allow us to meet SLSA L3+ requirements could allow us to relax the wording relating to "build platforms" but this is definitely the exception rather than the rule. I think the clarification that would be needed is probably harder to understand than just saying that the "build platform" is the reusable workflow + GitHub Actions in the case of slsa-github-generator.

In essence, even if the caller can't influence the separate workflow in the reusable workflows scenario, trust has been shifted away from the control plane/build platform to a separate SW artifact. In some ways, this also seems like a deviation from some of the SLSA guiding principles.

Just to be clear, which principles do you think it is a deviation from?

I think that needing to trust a separate SW artifact (the reusable workflow) is an accurate assessment, and it could perhaps be interpreted as a deviation from the guiding principles if given a narrow interpretation that trusting slsa-github-generator in addition to GHA would break the principle of "Establish trust in a small number of platforms and systems".

I will say that you have access to the exact source code that the reusable workflow runs, so you do have to trust the slsa-github-generator devs but not necessarily blindly trust them. So I don't personally think it's a huge leap for an OSS or private project to include slsa-github-generator in their trusted computing base but it is perhaps under-communicated.

In many ways the project is a stop-gap until GHA itself provides some kind of native SLSA support. So rather than try to clarify it on the SLSA side would it make more sense to clarify it on the slsa-github-generator side?

/cc @laurentsimon

@arewm
Copy link
Member

arewm commented Oct 16, 2023

@ianlewis , this also seems related to #966 (around self-hosted runners). I tried to address the need for clarification by adding to the FAQs: #989. Does this help the current conversation at all or can the change be modified to further clarify this question?

@ianlewis
Copy link
Member

@arewm You are correct in that it's related, as slsa-github-generator's assumptions about the builder.id and trust base are derived from the SLSA spec: https://slsa.dev/spec/v1.0/provenance#:~:text=The%20id%20MUST,by%20all%20consumers.

However, I'm not sure it helps with regard to slsa-github-generator as we rely on some of the properties of GitHub's hosted runners, but I agree some clarification could be made with regard to self-hosted runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants