Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoaringBitmap add an efficient findOffset method for the index of one assigned value. #627

Open
shouriken opened this issue Jun 6, 2023 · 7 comments

Comments

@shouriken
Copy link

shouriken commented Jun 6, 2023

Hello, i want use roaring bitmap to store indices of a sparse array; in sparse array, there is a frequently-used method valueAt(int index). RoaringBitmap provide the contains method, i can use it to judge index if assigned index; if not assigned, sparse array return a default value; but if assigned, there is not method provided by finding the offset of index, now i use rank(index) - 1 to get the offset, it seems not efficient enough.
If you can provide this method?

RoaringBitmap rb = RoaringBitmap.bitmapOf(1,2,3,1000);
int offset1 = rb.findOffset(1000); // return 3
int offset2 = rb.findOffset(2); // return 1
int offset3 = rb.findOffset(999); // return -1 or other unmeaning offset

Thanks.

@richardstartin
Copy link
Member

You could try using a FastRankRoaringBitmap which pre-computes the cumulative cardinalities of the containers to speed up rank.

@blacelle
Copy link
Member

blacelle commented Jun 6, 2023

@shouriken Are you looking for better .rank performance (in which case FastRankRoaringBitmap can indeed be helpful).

Or do you look for something like java.util.Arrays.binarySearch(long[], long) which would return the insertion point in case the searched element is not present? (i.e. expecting rb.findOffset(999); to return -4)

@shouriken
Copy link
Author

@blacelle @richardstartin
Yes, i can use rank in FastRankRoaringBitmap to get better efficiency. Or is there a efficient binarySearch in bitmap to get the offset?

@richardstartin
Copy link
Member

You can't know the offset without knowing the cumulative cardinality, so a binary search approach couldn't work without either computing or looking up the cumulative cardinality. The mismatch between rank and your use case is that rank does not return a sentinel value when the value is not found, which would cost you an extra contains check (which does use binary search on the containers and then a container-specific algorithm (either binary search or random access).

These operations could be fused into a single method but I don't envisage that a fused operation implemented here would offer much advantage over doing it yourself, and the benefit of not implementing it here is that it might confuse users for what probably isn't the most common use case.

@blacelle
Copy link
Member

blacelle commented Jun 6, 2023

You can't know the offset without knowing the cumulative cardinality, so a binary search approach couldn't work without either computing or looking up the cumulative cardinality.

This is essentially the implementation of FastRankRoaringBitmap :

https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastRankRoaringBitmap.java#L233-L235


These operations could be fused into a single method but I don't envisage that a fused operation implemented here would offer much advantage over doing it yourself, and the benefit of not implementing it here is that it might confuse users for what probably isn't the most common use case.

I finally get the use-case of @shouriken . For the record, a suggested .findOffset (with .binarySearch-like output) can be implemented through:

rank = bitmap.rank(n);
if (bitmap.contains(n)) {
    return rank;
} else {
    return -rank-1;
}

@shouriken If you look for performance, try with FastRankRoaringBitmap and similar custom .offset implementation. To have it going faster, one would have to invest in enabling even faster .rank (e.g. on a per-container basis) rather than adding a new core method.

@richardstartin
Copy link
Member

You can't know the offset without knowing the cumulative cardinality, so a binary search approach couldn't work without either computing or looking up the cumulative cardinality.

This is essentially the implementation of FastRankRoaringBitmap :

FWIW I'm surprised this wasn't clear from what's written above.

https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastRankRoaringBitmap.java#L233-L235

These operations could be fused into a single method but I don't envisage that a fused operation implemented here would offer much advantage over doing it yourself, and the benefit of not implementing it here is that it might confuse users for what probably isn't the most common use case.

I finally get the use-case of @shouriken . For the record, a suggested .findOffset (with .binarySearch-like output) can be implemented through:

rank = bitmap.rank(n);
if (bitmap.contains(n)) {
    return rank;
} else {
    return -rank-1;
}

@shouriken If you look for performance, try with FastRankRoaringBitmap and similar custom .offset implementation. To have it going faster, one would have to invest in enabling even faster .rank (e.g. on a per-container basis) rather than adding a new core method.

Yes, this is what was being suggested above.

@lemire
Copy link
Member

lemire commented Jun 6, 2023

The C version of Roaring has a function called find_index:
RoaringBitmap/CRoaring#470

The problem with the following is that it entails two potentially expensive queries...

rank = bitmap.rank(n);
if (bitmap.contains(n)) {
    return rank;
} else {
    return -rank-1;
}

A single findIndex function could be faster.

I invite a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants