-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining hashes of strings #114
Comments
There's a huge loss. Please take a look at inside We can say: Therefore, you can't expect your assumption. Please consider to use streaming functions. |
As suggested by @t-mat, the streaming functions were indeed designed exactly for this scenario. I assume you are concerned by speed. |
My goal is to provide a 'drop in' replacement for use of CRC32 to improve speed, with minimal code change to the many calling sites. The streaming approach is technically not applicable, because the hash of a stream broken up into 10 byte chunks will be the same as the hash of the exact same stream broken into 100 byte chunks. The 'structure' of the the discontiguous strings is relevant and should contribute to the hash result. This is a string which has 'indirect' references to other strings embedded within it at arbitrary locations (only one level deep). I need to hash the indirect strings and the subsets of the base string (not including the pointers to the indirect strings) I saw that using XXH64 is faster on 64 bit hardware, and will perform better than XXH32-- taking 32 bites of the result as the hash. I'm content with a 32 bit level of hash quality, so it seemed reasonable to put the 32 bit hash of the the various strings as the seed of incorporating the next string. |
If your existing program structure allow it, prefer passing 64-bits between 2 consecutive hashes. Perform the 32-bits extraction at the end only. It will maximize hash quality. |
Are you suggesting modifying the code to allow a 64 bit seed? Currently, rather than simply extracting the low 32 bits for seeding the next step, I'm xor'ing the high 32 and low 32 together so that all 64 bits 'contribute' to the 32 bit carry over. Unfortunately, it is not feasible (without copying data around) to append or prepend the intermediate 64 bit value to the next string, as I don't have control over that storage, it is handed to me. |
Never mind, I see now that XXH64 does allow a 64 bit seed value. |
Using 64 bit PowerPC and I'd like 32 bit hash result over a sequence of non-contiguous strings.
Is there any loss in hash quality if I am hashing a sequence of non contiguous strings using XXH64 and simply passing the result of each hash as the seed of the next XXH64 call? Also, I would only be taking the lower 32 bits of the final result as my final single 32 bit hash value representing the sequence of strings.
Subsequent hashes that are expected to be equal will be done against the exact same sequence of strings. In other words, I have no need for the final hash of this sequence "STRING1" , "STRING2" to be the same as the final hash of "STRIN", "G1STRING2"
My current code uses CRC32 and does the above (passing the intermediate result into the next string's as a seed)
Thanks.
The text was updated successfully, but these errors were encountered: