Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling window xxhash #193

Closed
xinglin opened this issue Apr 18, 2019 · 6 comments
Closed

rolling window xxhash #193

xinglin opened this issue Apr 18, 2019 · 6 comments
Labels

Comments

@xinglin
Copy link

xinglin commented Apr 18, 2019

Hi,

Does xxhash support rolling window based hashing? For example, assume we use a fixed-size rolling window of 48 bytes. We first calculate an xxhash for the first 48 bytes (0-47 bytes). Then, is it possible to calculate the next rolling window from offset of 1-48, by subtracting the impact from the 0th byte and add the 48th byte, without calculate an xxhash from scratch?

@Cyan4973
Copy link
Owner

Cyan4973 commented Apr 18, 2019

No, xxhash is not a rolling hash.
This would require to consume input byte after byte, and preserve perfect mathematical linearity, while xxhash ingests data by complete stripes, and breaks linearity on purpose, to improve bit distribution.

@xinglin
Copy link
Author

xinglin commented Apr 19, 2019

thanks for your explanation! I guess I have to look for other rolling hashes.

@xinglin xinglin closed this as completed Apr 19, 2019
@nh2
Copy link
Contributor

nh2 commented Oct 24, 2020

@Cyan4973 Being also in search of a very fast rolling hash function, here's a follow-up question:

Would it be possible for you to shortly outline, or link to, techniques that make xxHash fast and well-distributed, that could still apply to rolling hashes?

It would be fantastic to be able to craft a rolling hash, e.g. for rsync or in-memory data deduplication ala bup, that runs near the performance of xxHash.

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 24, 2020

The rolling hashes I know of are fundamentally different from xxhash, or any other high quality hash.
Imho, these 2 categories do not overlap.

@easyaspi314
Copy link
Contributor

easyaspi314 commented Oct 26, 2020

You are thinking too narrow, @Cyan4973.

XXH3 is definitely capable of being a rolling hash. Just not in the traditional sense.

Every aligned 1024 bytes (one block, given the default secret size) can be pushed or popped. This requires all 8 accumulators, but it is definitely doable.

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 26, 2020

Well, that's true.
It's just that all rolling hashes I know of have byte-level granularity, and applications which use them tend to need this level of granularity (finding flexible cut-points in a larger document is pretty common).

A KB-level granularity seems indeed possible with XXH3,
though it would require an application able to take advantage of this granularity level.

Also, it's not exactly as straightforward as a regular rolling hash,
where RH(p, p+N) = H(RH(p-1, p+N-1), p+N) - h(p-1).
For XXH3, since there is no linearity, it's still necessary to recombine and scramble all digested 1 KB fragments in the correct order.
That's still less work than reprocessing the entire input, but workload linearly increases with the size of segment to process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants