Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support directory hashes #730

Open
laurentsimon opened this issue Jan 8, 2024 · 12 comments
Open

Support directory hashes #730

laurentsimon opened this issue Jan 8, 2024 · 12 comments

Comments

@laurentsimon
Copy link
Contributor

laurentsimon commented Jan 8, 2024

As part of the effort to bring SLSA to ML https://github.com/google/model-transparency, we need to be able to sign directories. This requires the definition of a new "hash", i.e. how to serialize a directory. We have a PoC for this in the repo linked above, and need to implement it in slsa-verifier

@laurentsimon
Copy link
Contributor Author

/cc @mihaimaruseac @ramonpetgrave64

@laurentsimon
Copy link
Contributor Author

@smeiklej

@netomi
Copy link

netomi commented Jan 11, 2024

jsonnet-bundler has a small utility method to generate the hash of a directory which might be useful here as well: https://github.com/jsonnet-bundler/jsonnet-bundler/blob/master/pkg/packages.go#L351

@laurentsimon
Copy link
Contributor Author

this code is not safe from a cryptographic hash point of view, e.g. you can rename files to change their meaning. The hash we have in the model repo also handled parallel hashing using a tree. See comments in sigstore/model-transparency#49

@laurentsimon
Copy link
Contributor Author

An even greater problem with the hash is that it lacks delimiters between files. So the two following directories will produce the same hashes:
F1: "hello"
F2: "world"

will produce the same hash has:
F1: "hell"
F2: "oworld"

@netomi
Copy link

netomi commented Jan 11, 2024

ok I did not realize that the directory hash should be also taking that into account.

Maybe tree hashes as calculated by git would be useful. Here is some test that I performed by creating a file with the same content but different filename in different directories and how the hash would be calculated by git.

If the filename is equal, the hash is the same, if the filename differs, also the hash differs.

tn@proteus:~/workspace/eclipse/EclipseFdn/tmp$ git ls-tree HEAD
040000 tree 1e6dbf97adb05c42dcb537cd717e368812dc23b5	test
040000 tree 844053933521d6c52f2f96e288dc9175a2e6aea0	test2
040000 tree 1e6dbf97adb05c42dcb537cd717e368812dc23b5	test3

tn@proteus:~/workspace/eclipse/EclipseFdn/tmp$ git ls-tree -r HEAD
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238	test/test.txt
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238	test2/test2.txt
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238	test3/test.txt

@mihaimaruseac
Copy link
Contributor

This could work but forces existence of a .git directory and ties to git hashing algorithm.

@netomi
Copy link

netomi commented Jan 12, 2024

Sorry for the misunderstanding, I did not intend to suggest to use git itself, but rather its mechanism to generate tree hashes.

@mihaimaruseac
Copy link
Contributor

Oh, fair point. Thanks for clarifications.

@ramonpetgrave64
Copy link
Contributor

Just adding to the conversation:

merkle trees seem like they could be a good way to hash directories, and someone has tried this in go.

re: your comments, I think we could add an aptional CLI switch to slsa-verifier like --enforce-subject-name-and-path, and then the if the slsa-github-generator doens't already, it could put the relative paths in the subject.name.

@mihaimaruseac
Copy link
Contributor

Thank you! We're now also experimenting with a manifest file instead of a hash of everything, but probably this won't work for SLSA (sigstore/model-transparency#111). Let's continue experimenting

@laurentsimon
Copy link
Contributor Author

SLSA will replace the manifest format by a provenance format, the rest probably can remain the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants