-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uneven chunk size in xxh3 stream computes wrong hash #378
Comments
I confirm I can reproduce the issue with the linked source code. Reasons for the error is still unclear. |
Hmmmmmmm there's definitely more things awry here. If I add the single shot function as well, I get a third result.
But if I replace everything with the equivalent XXH64 functions, they all return the same result.
|
OK, I think I get it. |
OK, fixed it. I must now think of a way to trigger that bug during tests, and make it part of the PR. |
Great work! I'm glad it wasn't too hard to fix it. I guess I just got "lucky" with an rsync test run combo of xxh128 checksums + a compression test then ended up with reproducible failure (and then I just had rsync write out the checksum call sequence that was failing in a simple file format). I've verified that your "fix378" branch works fine in my rsync test run that was failing. |
Now I just need to figure out why rsync is computing a different checksum value for about 1/4 of the copied files I just copied compared to what xxh128 outputs (where no such issue occurs when comparing xxh64 values). |
Things are matching up correctly now. I think that I had only re-linked my code initially and I had to recompile it for it to be fully fixed. |
I noticed a difference in the computing of an xxh3 (and xxh128) hash when the chunks of data are of uneven size. I'm testing xxh3 for eventual inclusion in rsync (once it stabilizes), and when transfer compression is used, the chunk sizes on the receiving side can vary wildly, causing the file checksum to be erroneous.
I created a simple test case while I was diagnosing the issue, but it appears to not really depend on the data (though I supplied some in the test case) but on the weird chunk sizes. Just run "make" and it will compile the simple C program and process the 2 test files through it (one with even chunks, and one with uneven chunks).
streaming-bug.tar.gz
(Note: I updated the tar file to avoid an extra 0-length update call after each chunk. The input files have a trailing newline added to each chunk's text and tester.c was not discarding it, resulting in a 0-byte chunk due to the empty line. It didn't affect the end result, but it's cleaner with the change.)
The text was updated successfully, but these errors were encountered: