New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question/advice: iterating over an a.and(b) bitmap whilst keeping ranks of elements from 'a' #425
Comments
Can you give some simple examples of what you're trying to achieve? It sounds similar to my code where one bitmap represents ranges (blocks?) and second actual values. I want to estimate how many ranges contains at least one value. Example 1
Example 2
|
@ppiotrow I doubt @touisteur has access to |
I can share, but still not sure if he has similar problem to be solved. The main hint was to consider using iterator instead of AND, OR if you iterate very long loop. We are using it on giant dataset to estimate number of users performing some activities. Each range groups activity of single user. Edit: code without partitions |
Hi thanks for all the kind and interesting answers! My use case is : I'm given a (sometimes) huge list of blocks. Each block is a variable-sized ByteBuffer that, once sloooooowly deserialized, may or may not contain a specific field. So I perform, offline, a first veeeery slow pass where I note the presence of each field in each block, and I thus end up with a roaringbitmap of all the blocks containing this field. That's my 'presence' bitmap. The 'blocks' bitmap is given by the user as a list of blocks to look-up. The result of the operation must have the same indexes/ranks than the 'blocks' input. I think the double iterator approach would fit. I was worried about its performance. Didn't know about advance methods. |
Hi sorry I just reread the blocks=ranges comment, I'm not working with ranges, just discrete indices (like a unique increasing ID in a file). As an example : I'd want to have on my final structure |
|
Oh yes thanks I understand this is not directly possible with sets. I was looking for the 'best' way to iterate over the result of the 'and' operation with the minimum overhead for the rank/select operation on blocks /for each element of blocks.and(presence)/ It seems having two running iterators might be the 'best' way to do this? Is there an example somewhere of this approach? |
I would not call this approach "minimum overhead". Using iterators brings about a lot of overhead. To get good performance, you typically want to delay using iterators as much as possible. I recommend you run benchmarks. |
It seems your code @ppiotrow does almost what i want with the advanceIfNeeded() I'll need to try that and compare with the naive blocks.and(presence).foreach (i => result.add (blocks.select(i), i) |
Indeed thanks everyone for all the insights. Will report with benchmark results when I'm done implementing both. BTW that was a very friendly and instructive and helpful interaction, thanks a lot. |
Oh thanks it took me 3 takes to understand your idea. I'll try that also. |
Hi,
Sorry if it's not the place for questions or advice.
I'm using RoaringBitmap as indexes in an internal tool, and the performance kick I get from them is amazing compared to sparse matrices or my homemade clumsy solutions...
Still there's one operation I'm doing very often that I'm not sure how to implement efficiently.
I have 2 bitmaps : blocks and index. blocks usually has very high cardinality, index usually lower, sometimes the same, sometimes higher.
I want to iterate over blocks.and(index) BUT I need to keep the positions/ranks of elements from blocks. So for each element of blocks.and(index) I'd like to have that element's rank/position in blocks.
What I do today is I iterate (forEach) over blocks.and(index) and for each element I get their rank from blocks through rank() or select(). The performance is... not great.
I'm wondering whether there'd be a better, more idiomatic way of doing this? Should I be writing a specific andIterator that would give me this info? Anything I should be aware whilst going this path? Am I completely out of RoaringBitmaps' use case?
Thanks in advance for your help.
Lionel
The text was updated successfully, but these errors were encountered: