Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] workflow for publishing source code archives as release assets #2951

Open
junyer opened this issue Nov 7, 2023 · 26 comments · May be fixed by #3587
Open

[feature] workflow for publishing source code archives as release assets #2951

junyer opened this issue Nov 7, 2023 · 26 comments · May be fixed by #3587
Labels
area:generic Issue with the generic generator type:feature New feature or request

Comments

@junyer
Copy link

junyer commented Nov 7, 2023

Is your feature request related to a problem? Please describe.
Bazel recommends publishing source code archives as release assets – and Bazel Central Registry verifies stability by checking for …/releases/download/… in GitHub URLs. Using gh release download and gh release upload, GitHub Actions can automate this trivially, but OpenSSF punishes projects whose release assets lack signature and provenance.

Describe the solution you'd like
SLSA should provide a workflow for publishing source code archives as release assets with signature and provenance. Ideally, any project's release workflow could include a job specifying only permissions and uses keys and get .zip, .zip.intoto.jsonl, .tar.gz and .tar.gz.intoto.jsonl files attached to the release.

Describe alternatives you've considered
Letting N different projects implement this themselves in approximately N different ways. ;)

Additional context
N/A

@junyer junyer added status:triage Issue that has not been triaged type:feature New feature or request labels Nov 7, 2023
@laurentsimon
Copy link
Collaborator

Hi, thanks for the issue. You can achieve this today using the generator https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md and setting the upload-asset: true. Can you confirm this works for your use case?

@junyer
Copy link
Author

junyer commented Nov 8, 2023

I'm confident that it would, but the idea here is to spare every project the sha256sum … | base64 -w0 and base64-subjects dance and, moreover, to ensure that the gh release download and gh release upload dance is done by a trusted reusable workflow. It is – or it ought to be! – a common operation with approximately zero room for variations, so it should be made as convenient for every project as reasonably possible. Does that clarify the intention behind the request? :)

@laurentsimon
Copy link
Collaborator

I understand the sha256sum … | base64 -w0.
I don't fully understand the gh release download and gh release upload. The generator I linked to does do the release upload by setting the upload-asset: true. Can you clarify this point?

@junyer
Copy link
Author

junyer commented Nov 8, 2023

Just to be clear, the idea is not to generate the source code archives manually. (GitHub already does that automatically.) The reason to do the gh release download and gh release upload dance is to take the source code archives that are available via …/archive/refs/tags/… and make them available via …/releases/download/….

@laurentsimon
Copy link
Collaborator

laurentsimon commented Nov 9, 2023

ah, I missed this. Thank you. I'm not super familiar how …/archive/refs/tags/… are generated, and by whom. Are these the ones GitHub generates automatically? Are they not the same that are present in the release assets? Or are you merely saying that the API / URL to download them is not consistent with the APIs / URLs used to download other assets in the release (I've not verified if this is the case, just trying to parse your comment), and so you want to add them to the release explicitly?

@junyer
Copy link
Author

junyer commented Nov 10, 2023

As linked above, https://blog.bazel.build/2023/02/15/github-archive-checksum.html describes the situation quite well, I think, with the screenshot illustrating the difference between …/archive/refs/tags/… and …/releases/download/… in terms of the release assets. The problem that the gh release download and gh release upload dance solves is one of stability.

I should just clarify that the filenames in the …/archive/refs/tags/… URLs are not the filenames that GitHub actually serves. For the 2023-11-01 release of RE2, for example, GitHub will do the following:

https://github.com/google/re2/archive/refs/tags/2023-11-01.zip
-> location: https://codeload.github.com/google/re2/zip/refs/tags/2023-11-01
-> content-disposition: attachment; filename=re2-2023-11-01.zip

Likewise, gh release download uses the "real" filename, so the workflow that I'm proposing would not have to rename files. It's just about doing the gh release download and gh release upload dance and, in the process, generating signature and provenance.

@ianlewis ianlewis added area:generic Issue with the generic generator and removed status:triage Issue that has not been triaged labels Dec 6, 2023
@ianlewis
Copy link
Member

ianlewis commented Dec 6, 2023

Maybe this could be an option on the generic generator? I'm not sure we need a totally separate workflow. WDYT?

@junyer
Copy link
Author

junyer commented Dec 6, 2023

I had "do one thing and do it well" in mind, I think, when I suggested another workflow. generator_generic_slsa3.yml has a lot of knobs whereas this use case needs approximately zero knobs. Reusing the generic generator makes sense, of course, but I would argue that there's value in encapsulating/hiding its complexity.

@ianlewis
Copy link
Member

ianlewis commented Dec 7, 2023

Yeah. I hear that. I think 95% of the code would be the same though. The only difference would be that we could omit base64-subjects and base64-subjects-as-file. I think all the other inputs would still be relevant.
https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md#workflow-inputs

@junyer
Copy link
Author

junyer commented Dec 7, 2023

Fair enough. :)

@junyer
Copy link
Author

junyer commented Mar 30, 2024

In light of CVE-2024-3094, could this effort possibly be prioritised? ;)

@junyer
Copy link
Author

junyer commented Apr 9, 2024

In case it helps, I wrote https://github.com/google/re2/blob/main/.github/workflows/release.yml earlier today. :)

@laurentsimon
Copy link
Collaborator

laurentsimon commented Apr 18, 2024

Hey, sorry for the late reply. Thank for putting an example together. We could support this using BYOB https://slsa.dev/blog/2023/08/bring-your-own-builder-github and https://github.com/slsa-framework/slsa-github-generator/blob/main/BYOB.md

We should be able to take your example and wrap it in a BYOB fairly easily. This would let your users (and other projects' users) to use the slsa-verifier to verify, out of the box, with a common builder.

The only thing we need to change is that the command gh release create "${GITHUB_REF_NAME}" \ --generate-notes --latest --verify-tag \ --repo "${GITHUB_REPOSITORY}"
would be done by repository owners and then they'd call our tar / zip builder to create the tarball / zip and upload it. Please correct me if that's incorrect.

Would that work? Happy to help make that happen

@junyer
Copy link
Author

junyer commented Apr 18, 2024

IIUC, yes, the PW would run gh release create (except using the API instead of the CLI) and then invoke the TRW, which would handle everything else. Although now I'm guessing that SLSA generating signature and provenance would make Sigstore signing superfluous. :)

@laurentsimon
Copy link
Collaborator

laurentsimon commented Apr 18, 2024

IIUC, yes, the PW would run gh release create (except using the API instead of the CLI) and then invoke the TRW, which would handle everything else. Although now I'm guessing that SLSA generating signature and provenance would make Sigstore signing superfluous. :)

You would not need the sigstore signatures, but the SLSA builders use Sigstore too :) There are 2 (related) advantages to a common builder:

  1. Other projects can use the same builder and inspect the code once
  2. No need for users to read each project's workflow code to read how the tarball / zip files are created

If you're OK with all that, I can ahead and turn your example into a SLSA builder

@laurentsimon
Copy link
Collaborator

@junyer
Copy link
Author

junyer commented Apr 18, 2024

If you're OK with all that, I can ahead and turn your example into a SLSA builder

SGTM. Thanks! :D

@laurentsimon
Copy link
Collaborator

laurentsimon commented Apr 23, 2024

Hey, a few months ago GitHub made (then reverted) the tarballs non-deterministic, see https://github.com/orgs/community/discussions/45830. To avoid this sort of problems in the future, the builder could create the archives itself and upload them to the release. That will also let us support other types or archive (if we need to) in the future. Wdut of this approach?

@junyer
Copy link
Author

junyer commented Apr 23, 2024

Downloading the source code archives and uploading them as release assets is sufficient, AFAIK, because that's what confers stability. I reckon that creating them (i.e. explicitly) wouldn't add value... but would add complexity. If somebody downloads a slightly different one later because they clicked or pasted the wrong link, I don't think it actually matters who or what created the one at the right link. Or am I misunderstanding a risk from a trust perspective here?

@laurentsimon
Copy link
Collaborator

laurentsimon commented Apr 23, 2024

Downloading the source code archives and uploading them as release assets is sufficient, AFAIK, because that's what confers stability. I reckon that creating them (i.e. explicitly) wouldn't add value... but would add complexity.

Not super complicated I think. The BYOB framework already clones the repo, so code is available. We would just need to zip / tar it which does not seem too complicated.

If somebody downloads a slightly different one later because they clicked or pasted the wrong link, I don't think it actually matters who or what created the one at the right link. Or am I misunderstanding a risk from a trust perspective here?

stability requires that archives downloaded from GitHub (with the same link) to be deterministic. The link above shows that GitHub generates archives on the fly when they are requested (to save up storage space I think). This means the signature would fail if the archive is different at download vs when it was first signed. In the link above, that had changed the compression algo so there was a mismatch between sign-time vs download-time of the archive.

Lmk what you think.

@junyer
Copy link
Author

junyer commented Apr 23, 2024

I think you might be confusing source code archives and release assets. The image below (taken from the Bazel blog) hopefully clarifies:

Note well that downloading the source code archives and uploading them as release assets makes them stable as release assets, not as source code archives. That's why I'm arguing that, at release time, it doesn't matter whether the source code archives are deterministic. If somebody ends up using a copy of a nondeterministic file, then it really doesn't matter how the deterministic file was created, does it?

@laurentsimon
Copy link
Collaborator

My bad, I had not seen that you're re-uploading the source archives as release assets. I thought your example workflow only signed the downloaded source archives without re-uploading them. Re-creating or downloading existing source archives works. Is it fair to say that how the archive is created is an implementation detail you don't care too much about? Or you do care? I'll probably download them to simplify the first iteration, but would like to know if it makes a different for your use case, in particular security wise.

@junyer
Copy link
Author

junyer commented Apr 23, 2024

To date, manual manipulation of the source code archives is the big problem. Trusting their creation to GitHub seems no more risky than trusting everything else to GitHub, honestly, so obtaining them from GitHub – as opposed to creating them explicitly – suits me just fine.

@laurentsimon laurentsimon linked a pull request Apr 24, 2024 that will close this issue
5 tasks
@laurentsimon
Copy link
Collaborator

@ramonpetgrave64
Copy link
Collaborator

Release artifacts are mutable. I think if we can guarantee the archives to be reproducible, we should try to do it.
https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html

@laurentsimon
Copy link
Collaborator

laurentsimon commented May 2, 2024

Draft PR is #3587. I think we need to tweak it to reduce permissions by using https://github.com/slsa-framework/slsa-github-generator/blob/main/.github/workflows/delegator_lowperms-generic_slsa3.yml, then we're good to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:generic Issue with the generic generator type:feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants