-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XXH128 #104
Comments
Yes |
Great, so you need 2 days or 2 weeks to make it done? |
probably more than that. It's difficult to commit to a date, since other activities just get in the way. I suppose I could find some time to work on it this coming Summer. |
Thanks, I'm really glad to hear that |
@Cyan4973 Any progress? :) |
Heck no, my timeslot for xxHash this Summer just vanished in front of other priorities. |
@Cyan4973 Any progress? :) |
Sort of, but on the negative side. This leaves only a couple of options on the table. |
Avoid shameless plug please |
A well distributed 128-bit hash can just use full states v0,v1,v2,v3? Or is there more to it? |
Yes, It works. It could be this was a bad idea after all, and @JCash approach, by virtue of being more straightforward, might prove the better one. As stated, there are still a couple of ideas I wanted to try out before giving up on vectorization-based hashing. |
What would the use case be for this 128bit hash? First I was thinking to lower the probability of collisions but I understood that this is already so low for 64bit hash that going towards 128bit wouldn't be efficient as it would take up double the storage and the gain is very low. Am I missing something? |
@Bramzor it becomes a consideration if you are hashing a high amount of traffic (billions of keys) and collision resistance is a high priority. It does not need to be expensive to compute, but storage is also a concern. The gains are higher, as each increase of hash capacity is exponential. That is why a 256-bit hash is considered unbreakable in crypto. |
@bryc is there an idea/comparison between 64bit and 128bit or more in probability of collisions? I'm looking at using xxhash for "fingerprinting" of data for a very large number of data. Although false positives wouldn't be a big issue in my specific use case. |
The table displayed at https://stackoverflow.com/a/14210379/646947 is interesting to answer to this question. As stated by @bryc, 128 bits becomes useful when there are millions of entries to store, and user has no collision resolution mechanism in place, so wants to significantly reduce the chances of a single collision to ever happen. |
Can we accept the pull request for the XXH128 code by JCash for calculating 128bit hashes? I don't want to use a not accepted pull request for this as it might cause issues if a different XXH128 version might come out later. But this will not happen if we can accept this pull request as the valid (first) XXH128 version? If needed, it's always possible to add a XXH128v2 later if there is a better/faster method right? |
Even if I end up selecting a solution similar to @JCash proposal, Reason is, we've learned a lot since then, There are mostly 2 areas of improvements :
The second one is pretty important. |
@Bramzor if you don't need any specific functionality of xxHash framework, you can use one of well-established 128-bit hashes: SpookyHash, MurmurHash3 or HighwayHash @Cyan4973 I think that my zzHash is 80% done as x86 hash with 128-bit output. Unfortunately, I don't finished it. The remaining things are:
By 80% I mean 80% of hash formula design, not counting efforts required for actual implementation with incremental hashing and so on. zzHash speed is 2.5 bpc, compared to 2 bpc of xxHash32 and 4 bpc of xxHash64. zzHash-wide, employing x64 commands, is 5.5 bpc, and can deliver up to 512-bit results, but doesn't have any concrete design yet. |
😮 How would that work? That would be very nice indeed! I have made an attempt to adapt 32-bit XXH32 code to produce XXH128 in JavaScript (code is C-like), and it is still fast, but very likely low quality hash (cannot really use smhasher, but seems v3-v4 have more collisions). So I would like to port that to JS too ;). |
@Bulat-Ziganshin : Don't worry too much about small inputs, it is a different set of optimizations, and must be dealt with a different code path. I can take care of that.
This is a bit more fuzzy, but one could start by considering 32-bits constants instead of 64-bits one, There are similar considerations to keep in mind when introducing |
@Cyan4973 what about my idea of developing m/t friendly hash? Do you want to include that in xxh? If you don't buy it, I can skip its development, although, being perfectionist, I would like to see that implemented. The idea is: when input is longer than 4KB, it's split into 4 KB chunks, and we compute the same 160-bit raw hash value of each chunk as we do for smaller-than-4KB inputs. Then we perform hashing over this stream of 160-bit values. This allows to use up to 4K/20 = 200 threads for hashing, making overall computation up to 200x faster. For inputs larger than 800 KB, we may perform the second layer same way, allowing to use up to 40K threads that looks appropriate for modern GPUs. Since algorithm used to compute raw 160-bit hash is the same for smaller inputs (smaller than 4KB) and the first chunk of larger inputs, we don't need to buffer data in incremental implementation. All we need is to store (in the hash state structure) extra 160 bits of current raw hash for each added hashing layer. |
I like the idea. Note though that current hash speed is already pretty high, But even a more limited 2x/4x boost is still nice. Thread synchronization is not free though. |
Premature optimization is the root of all evil.... having that said, I can only speak for myself but I think most people will have the same opinion which is: The speed of XXH64 is more than good enough (I use it for only hashing a small string for example) and therefore, people would be happy with just the same performance but as a 128bit hash. Why would people use the XXH128, that is because they like to use libraries they already are familiar with. The risk at this moment is that it will take some time for XXH128 to reach all the different languages so that might "force" people to use other hashing lib. I could go ahead for some time with XXH64 but at some time, I will need the 128bit anyway and if it will not be here anytime soon, I will have to give up XXH (which I really like) for some other hashing. If XXHash wants to be the best (fastest) 128bit hash there is, I can respect that but than I would advise people (and myself) to go for something "that just works" and is commonly supported. My 2 cents. |
@Cyan4973 You are right, the goal is to make memory b/w the only limit, since the computations are so simple. On Nvidia GPU, the commands processing each 4 input bytes in zzHash (2ADD+MUL+ROL) require about 10 cycles, resulting in the speed of 0.5-1 GB/s per thread. Even with 200 threads, we can't fill 900 GB/s memory b/w of Tesla V100. So we should either further increase the chunk size or add second layer of chunking. With two layers we can provide with work up to 40K threads, which is 20-40x more than required to fill the memory b/w. As time goes, GPUs get more ALUs, hence ratio of memory b/w per single ALU speed is increased, but this reserve should be enough for the next 20 years. In order to fill memory b/w of V100 we need about 1K threads, so hash computation will be memory bound starting from 4 MB input size. On GPUs, starting one warp (32 threads) is about as expensive as calling a subroutine, i..e it just requires to copy parameters and initialize locals, so the price is negligible compared to the price of processing 32*4K bytes. OTOH, by compressing each 4 KB of input into 20 bytes which then are hashed again, we add extra 20B/4KB = 0.5% work for inputs larger than 4 KB, which is again a negligible price. It's why i choose 4 KB chunk size and, correspondingly, 200*4KB=800KB second layer chunk size. Indeed, syncing on 4 KB chunks is inefficient on desktop CPUs/OSes. But we don't need to do it. Each thread in CPU implementation can process larger blocks containing multiple chunks and then either 1) store 160-bit raw hash of each chunk into memory for later processing, or 2) process 800 KB chunks entirely, computing their 160-bit hashes for the next hashing layer. OTOH, 32/64 KB chunk size will be a bit more efficient for desktop CPUs (no 0.5% overhead, 8-16x less immediate data to store), but it will increase minimal efficiently processed input size for V100 from above-mentioned 4 MB to 32-64 MB. I just pushed my first version of this algo (with only one extra layer). Key point of the main hash function: https://github.com/Bulat-Ziganshin/FARSH/blob/master/SMHasher/xxHashTest.cpp#L270 |
@Cyan4973 |
@bryc zzHash (which is playground for one possible XXH128 implementation) has this feature too. While Cyan talked (in your citation) about another hash processing data in 64-bit chunks, thus employing 64-bit variables, but simplifying some operations in order to improve x86/SSE2 efficiency. Processing input in 64-bit chunks doubles performance on 64-bit CPUs, so you have to choose. |
Yeah that looks promising. Looking closer, it seems it can be ported to native JS since it uses pure 32-bit integers. I have tried, but my C++ ability is weak so I can't even compile SMhasher on VS. But I like the ideas/changes in combining/final mixing steps. Anyway, 👍 for x86 XXH128 ;) maybe even x86 XXH64 with little extra work. |
Another possible solution : This would be relatively trivial to implement. So far, I've refrained from doing so, because of the risk of confusion : It sounds like a simple problem, but it was enough of a deterrent. I nonetheless like it, because it keeps XXH128 available for some later possibility to implement a hash function working on units of 128 bits. |
XXH32X2 ? Bit of a mouthful I know |
Yeah, I'm now convinced that the confusion risk is significant enough to not be worth it. So no strange xxh32->64 in sight for the time being. XXH128 will be a hash producing a 128 bits output, |
If you want to avoid the confusion you could also opt for another strategy: XXH96 based on the 32 bit "engine". |
The past days i did some test with SpookyHash (V2) the 128 bit Hash version was a bit faster than xxHash64 (64bit compilation optimized for sandy bridge), both close to 10GB/s on Sandy Bridge 2500K. Tested 4K, 256K buffers (256K cached and non cached). At the same time SpookyHash has 64 and 32-bit versions of the resulting Hash. For smaller buffers I would do a test first, I saw a performance increase of more than 20% going from 1K to 4K. So for those that are in need for a 128bit Hash running on 64bit hardware, it's worthwhile looking at SpookyHash. SpookyHash is developed in 2012 by Bob Jenkins, who already wrote about Hash functions in 1997 in Dr. Dobbs Journal. |
Patch currently under review in #174 |
Hi, do you have any plans for creating 128 bit version( 16 bytes for resulting hash )?
The text was updated successfully, but these errors were encountered: