Clarify in documentation if zero can be returned #887

mikesreed · 2023-10-05T10:40:58Z

Hi,
I'm currently using SHA1 for a hash in a program but was considering xxHash instead. However, my code relies on the fact (belief?) that SHA1 can not create a hash of all zero bits. I've gone through the xxHash documentation but can't find any mention of this. I'm specifically interested in the XXH64 and XXH3_64 algorithms. But it would be nice to state clearly for all the different algos whether they can ever return zero or not.
Thanks for your time.
Regards,
Mike.

t-mat · 2023-10-05T13:17:35Z

Hi @mikesreed , thank you for your opion.
I actually 👍 for better clarification in the document / code.

In short, xxHash may return 0 (all bits are 0s).
The shortest example is "BAD" seed with NULL (0 byte) input.

// example-xxh64-returns-0.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define XXH_STATIC_LINKING_ONLY
#define XXH_IMPLEMENTATION   /* access definitions */
#include "xxhash.h"

int main()
{
    uint64_t const seed = (uint64_t) (0ULL - XXH_PRIME64_5);
    XXH64_hash_t const hash = XXH64(NULL, 0, seed);

    // Converts hash value to canonical representation
    XXH64_canonical_t c;
    XXH64_canonicalFromHash(&c, hash);

    // Display the content of canonical representation
    printf("hash = 0x");
    for (size_t i = 0; i < sizeof(c.digest); ++i) {
        printf("%02x", c.digest[i]);
    }
    printf("\n");

    return EXIT_SUCCESS;
}

@easyaspi314 's nice sketch will help you to understand this code.

Cyan4973 · 2023-10-05T15:55:27Z

The 0 value has as many chances to appear as any other value, which is 1 / 2^64 for both XXH64() and XXH3_64bits(). That's extremely tiny, as in many many millions times smaller than your chance to win the powerball lottery. Still, if you were to "try your luck" many many times (as in many billion times), it's no longer insignificant.

The situation is the same for SHA1, though in this case, the probability is reduced by the larger length of SHA1, to 1 / 2^160, which is virtually the same as nothing.
This is a property shared with XXH3_128bits(), which may also output 0 but with a probability of 1 / 2^128, which is also good enough to be considered similar to nothing.

The main difference between SHA1 and XXH3_128bits() (beyond the very large speed difference) is that SHA1used to be "cryptographic", which means that manufacturing on purpose a specific return value (0 in this case) is extremely hard (but no longer impossible, that's why SHA1 lost its "cryptographic" label some years ago). XXH3_128bits() is not cryptographic, so it's easier to generate a value 0 on purpose.

That's the difference between "accidental" and "intentional" collisions.
If you use case only aims at protecting against "accidental" collision with the value 0, then XXH3_128bits() should be good enough for your use case (and much faster).

However, if producing a value 0 intentionally could lead to a security risk and the input could be manipulated by an external attacker, you should use a cryptographic algorithm, but a stronger one than SHA1. Something like SHA256 or BLAKE3.

Cyan4973 added the question label Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify in documentation if zero can be returned #887

Clarify in documentation if zero can be returned #887

mikesreed commented Oct 5, 2023

t-mat commented Oct 5, 2023 •

edited

Cyan4973 commented Oct 5, 2023 •

edited

Clarify in documentation if zero can be returned #887

Clarify in documentation if zero can be returned #887

Comments

mikesreed commented Oct 5, 2023

t-mat commented Oct 5, 2023 • edited

Cyan4973 commented Oct 5, 2023 • edited

t-mat commented Oct 5, 2023 •

edited

Cyan4973 commented Oct 5, 2023 •

edited