Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven chunk size in xxh3 stream computes wrong hash #378

Closed
WayneD opened this issue May 28, 2020 · 7 comments · Fixed by #379
Closed

Uneven chunk size in xxh3 stream computes wrong hash #378

WayneD opened this issue May 28, 2020 · 7 comments · Fixed by #379
Labels

Comments

@WayneD
Copy link
Contributor

WayneD commented May 28, 2020

I noticed a difference in the computing of an xxh3 (and xxh128) hash when the chunks of data are of uneven size. I'm testing xxh3 for eventual inclusion in rsync (once it stabilizes), and when transfer compression is used, the chunk sizes on the receiving side can vary wildly, causing the file checksum to be erroneous.

I created a simple test case while I was diagnosing the issue, but it appears to not really depend on the data (though I supplied some in the test case) but on the weird chunk sizes. Just run "make" and it will compile the simple C program and process the 2 test files through it (one with even chunks, and one with uneven chunks).

streaming-bug.tar.gz

(Note: I updated the tar file to avoid an extra 0-length update call after each chunk. The input files have a trailing newline added to each chunk's text and tester.c was not discarding it, resulting in a 0-byte chunk due to the empty line. It didn't affect the end result, but it's cleaner with the change.)

@Cyan4973
Copy link
Owner

Cyan4973 commented May 28, 2020

I confirm I can reproduce the issue with the linked source code.

Reasons for the error is still unclear.
Another test, which ingests data in segments of random length, seems to work fine so far, using exactly the same API.
Investigation is ongoing.

@easyaspi314
Copy link
Contributor

easyaspi314 commented May 28, 2020

Hmmmmmmm there's definitely more things awry here.

If I add the single shot function as well, I get a third result.

./tester normal-chunks uneven-chunks

normal-chunks
incr:   cf3093cedd619ac9
full:   cf3093cedd619ac9
single: 8ef77a435d6e1fa9

uneven-chunks
incr:   ba74115927b80788
full:   cf3093cedd619ac9
single: 8ef77a435d6e1fa9

But if I replace everything with the equivalent XXH64 functions, they all return the same result.

./tester normal-chunks uneven-chunks
                                                                                   normal-chunks
incr:   98a65b63b84f8277
full:   98a65b63b84f8277
single: 98a65b63b84f8277

uneven-chunks
incr:   98a65b63b84f8277
full:   98a65b63b84f8277
single: 98a65b63b84f8277

@Cyan4973
Copy link
Owner

OK, I think I get it.
It requires a specific sequence of sizes to show up,
but I'm nonetheless surprised this issue did not show up during tests.
Something must be improved there.

@Cyan4973 Cyan4973 added the bug label May 28, 2020
@Cyan4973
Copy link
Owner

OK, fixed it.
Indeed, it was a pretty nasty bug. Thanks @WayneD for pointing that out !

I must now think of a way to trigger that bug during tests, and make it part of the PR.

Cyan4973 added a commit that referenced this issue May 28, 2020
Streaming implementation of XXH3/XXH128 could fail, reporting an incorrect hash value,
for certain specific ranges of length, and providing input in segments of random length.
issue reported by @WayneD
fix #378
updated test case to catch this issue
@WayneD
Copy link
Contributor Author

WayneD commented May 29, 2020

Great work! I'm glad it wasn't too hard to fix it. I guess I just got "lucky" with an rsync test run combo of xxh128 checksums + a compression test then ended up with reproducible failure (and then I just had rsync write out the checksum call sequence that was failing in a simple file format).

I've verified that your "fix378" branch works fine in my rsync test run that was failing.

@WayneD
Copy link
Contributor Author

WayneD commented May 29, 2020

Now I just need to figure out why rsync is computing a different checksum value for about 1/4 of the copied files I just copied compared to what xxh128 outputs (where no such issue occurs when comparing xxh64 values).

@WayneD
Copy link
Contributor Author

WayneD commented May 29, 2020

Things are matching up correctly now. I think that I had only re-linked my code initially and I had to recompile it for it to be fully fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants