Using built-in Python hash for sorting goes against reproducibility #178
Labels
stale::closed
[bot] closed after being marked as stale
stale
[bot] marked as stale due to inactivity
type::feature
request for a new feature or capability
Checklist
What happened?
I've looked a bit into properly (byte-for-byte) reproducing the compressed artifacts in mamba and noticed that for the file-sorting the builtin
hash
function is used.I think that's not a great choice because it makes the tarball less reproducible (e.g. from other programming languages, but also across different python versions). E.g. Python 4.x might decide to use a different string hashing algorithm.
I would propose to use some easy-to-implement string hashing algorithm instead (e.g.
djb2
: http://www.cse.yorku.ca/~oz/hash.html) or do away with it for sorting.Conda Info
No response
Conda Config
No response
Conda list
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: