Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify in documentation if zero can be returned #887

Open
mikesreed opened this issue Oct 5, 2023 · 2 comments
Open

Clarify in documentation if zero can be returned #887

mikesreed opened this issue Oct 5, 2023 · 2 comments
Labels

Comments

@mikesreed
Copy link

Hi,
I'm currently using SHA1 for a hash in a program but was considering xxHash instead. However, my code relies on the fact (belief?) that SHA1 can not create a hash of all zero bits. I've gone through the xxHash documentation but can't find any mention of this. I'm specifically interested in the XXH64 and XXH3_64 algorithms. But it would be nice to state clearly for all the different algos whether they can ever return zero or not.
Thanks for your time.
Regards,
Mike.

@t-mat
Copy link
Contributor

t-mat commented Oct 5, 2023

Hi @mikesreed , thank you for your opion.
I actually 👍 for better clarification in the document / code.

In short, xxHash may return 0 (all bits are 0s).
The shortest example is "BAD" seed with NULL (0 byte) input.

// example-xxh64-returns-0.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define XXH_STATIC_LINKING_ONLY
#define XXH_IMPLEMENTATION   /* access definitions */
#include "xxhash.h"

int main()
{
    uint64_t const seed = (uint64_t) (0ULL - XXH_PRIME64_5);
    XXH64_hash_t const hash = XXH64(NULL, 0, seed);

    // Converts hash value to canonical representation
    XXH64_canonical_t c;
    XXH64_canonicalFromHash(&c, hash);

    // Display the content of canonical representation
    printf("hash = 0x");
    for (size_t i = 0; i < sizeof(c.digest); ++i) {
        printf("%02x", c.digest[i]);
    }
    printf("\n");

    return EXIT_SUCCESS;
}

@easyaspi314 's nice sketch will help you to understand this code.

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 5, 2023

The 0 value has as many chances to appear as any other value, which is 1 / 2^64 for both XXH64() and XXH3_64bits(). That's extremely tiny, as in many many millions times smaller than your chance to win the powerball lottery. Still, if you were to "try your luck" many many times (as in many billion times), it's no longer insignificant.

The situation is the same for SHA1, though in this case, the probability is reduced by the larger length of SHA1, to 1 / 2^160, which is virtually the same as nothing.
This is a property shared with XXH3_128bits(), which may also output 0 but with a probability of 1 / 2^128, which is also good enough to be considered similar to nothing.

The main difference between SHA1 and XXH3_128bits() (beyond the very large speed difference) is that SHA1used to be "cryptographic", which means that manufacturing on purpose a specific return value (0 in this case) is extremely hard (but no longer impossible, that's why SHA1 lost its "cryptographic" label some years ago). XXH3_128bits() is not cryptographic, so it's easier to generate a value 0 on purpose.

That's the difference between "accidental" and "intentional" collisions.
If you use case only aims at protecting against "accidental" collision with the value 0, then XXH3_128bits() should be good enough for your use case (and much faster).

However, if producing a value 0 intentionally could lead to a security risk and the input could be manipulated by an external attacker, you should use a cryptographic algorithm, but a stronger one than SHA1. Something like SHA256 or BLAKE3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants