Improve the performance of roaring bitmap under sparse cases #319

stdpain · 2021-09-07T04:32:38Z

In a sparse roaringbitmap, the size of array_container may often be 1, 2, 3. But we can actually reuse the container pointer (sizeof(void*)) as an array of uint16_t to avoid extra memory allocation.
eg:

typedef struct roaring_array_s {
    int32_t size;
    int32_t allocation_size;
    void **containers;
    uint16_t *keys;
    uint8_t *typecodes;
    uint8_t flags;
} roaring_array_t;

roaring_array_t ra;
// some init code
ra.typecodes[i] = SINGLE_CONTAIN_TYPE + cardinality << 6;
((uint16_t*)ra.containers[i])[1] = value1;
((uint16_t*)ra.containers[i])[2] = value2;
((uint16_t*)ra.containers[i])[3] = value3;

The text was updated successfully, but these errors were encountered:

lemire · 2021-09-07T18:48:55Z

Yes. But before we add complexity, we would need some motivating benchmarks showing the the extra work is worth it.

stdpain · 2021-09-08T07:29:17Z

@lemire
Hi ,I tested this optimization briefly in my application and the improvement for deserialization is quite obvious.

This is my test data generator:

 // the same key will add to the same bitmap
 static std::default_random_engine e;
 static std::uniform_int_distribution<int64_t> u(start_range, end_range);
 // start_range = 0
 // end_range = 5000000000
 // keys == 1000
 for(int k = 0;k < keys;k++) {
 // lines == 50000
    for(int i = 0;i < lines; ++i) {
        std::cout << k << "\t" << u(e) << '\n';
    }
 }

This is the change I made to CRoaring (an earlier version):

stdpain@a0f5049

I test It In my Application

before:
mysql> select bitmap_union_count(v) from bitmap_r50E_x;
+-------------------------+
| bitmap_union_count(`v`) |
+-------------------------+
|               100000000 |
+-------------------------+
1 row in set (8.63 sec)
after:
mysql> select bitmap_union_count(v) from bitmap_r50E_x;
+-------------------------+
| bitmap_union_count(`v`) |
+-------------------------+
|               100000000 |
+-------------------------+
1 row in set (3.67 sec)

lemire · 2021-09-08T15:57:56Z

Yes. I saw this and it makes sense to me. I'd still want someone to review it independently.

Oppen · 2021-10-22T12:51:08Z

I proposed a similar concept a while ago for rle but never got around writing the benchmarks. I'd advice trying to use a union for the array_container_t and infer whether we're using the pointer as array from the capacity and measuring again.
That way we avoid adding a new container type, which would either require breaking backwards compatibility for the frozen format or losing this optimization when loading from them (a valid option, but if it's not too expensive to keep it I think it's better to do so). Besides, that goes in line with some proposed changes to representation that involve embedding the structs in the array anyway.
Other than that, it seems OK to me, some style fixes and the like would be needed for the provided patch.

Oppen · 2022-04-26T20:59:54Z

@stdpain would you mind making a PR from your changes? Otherwise, at some unspecified point in the future would you like me to create one (with proper credit)?

stdpain · 2022-05-01T17:25:01Z

@Oppen OK

Oppen · 2022-08-11T06:53:30Z

I have a very ugly suggestion but you may be interested in trying it. Since our arrays are always sorted and never* empty we could simply zero-terminate them for the low count case, gaining a few more elements or even eliminating the indirection altogether, not storing the sizes at all. If the first element is zero, then there's a zero, but every other zero signals end of container.

stdpain · 2022-08-11T08:57:57Z

I have a very ugly suggestion but you may be interested in trying it. Since our arrays are always sorted and never* empty we could simply zero-terminate them for the low count case, gaining a few more elements or even eliminating the indirection altogether, not storing the sizes at all. If the first element is zero, then there's a zero, but every other zero signals end of container.

sounds great.

Oppen · 2022-09-21T16:34:59Z

More than a month later, but I put that * and never what the clarification was. There's an exception for lazy operations that may leave such a state, but you have to call repair to restore the invariant lest you want undefined behavior.

madscientist · 2022-11-20T15:02:37Z

I would love the "small bitmap optimization" described in #319 (comment) We do something like this but with a wrapper around the type which is annoying since we have to translate all the calls. The wrapper has a union of a roaring::Roaring (this is C++) and an array of 32bit ints plus a "small count" which we overlay on the empty padding at the end of the roaring_array_t. Since sizeof(roaring_array_t) is 40 bytes and we need at least one byte for the count field, that gives us (40 - 1) / 4 = 9 values we can use to hold 32bit values, before we have to convert from the "small" bitmap to a full bitmap.

We've tested this and in our environment, at least, where we can have a wide variety of sizes of bitmap and it really helps performance in our use-cases, even with the one-time overhead of the conversion. I'm sure that if this was supported directly the overhead could be reduced further.

What we do currently is only support a simple subset of operations on this special type of bitmap. If we need to do any of the more complex operations, for example if we're not sure the result will fit into the constant size, we convert it to a normal bitmap first. What we're keeping is basically an array bitmap, but without the overhead of the size (compile-time constant) or capacity (same as the size). Maybe an embedded implementation could get some synergy there.

josh08287 · 2023-11-14T18:13:15Z

Would love to see this incorporated.
We have running into this situation a lot when we have sparse bitmaps that need to be merged together by a parallel algorithm.

Our use case is :
We have parallel lock free algorithms creating many bitmaps that may have sparse membership and non-unique membership. We chunk many very large files into multiple segments that create their own bitmaps that are merged together at the end of processing all chunks resulting in most bitmaps being sparsely populated. Profiling revealed that insertion into these as roaring bitmaps is very slow compared to a simple bitarray.

lemire · 2023-11-14T19:01:59Z

@josh08287 Can you contribute a benchmark?

See https://github.com/RoaringBitmap/CRoaring/tree/master/microbenchmarks

The difficulty with this issue is that people refer to internal use cases, but there is no corresponding benchmark we can refer to. It means that this issue is unlikely to make progress.

I would never merge an optimization without a corresponding credible benchmark. It is too easy to think we are optimizing when we are not.

lemire · 2023-11-14T19:03:58Z

The PR by @stdpain did not match the current code, and lacked a benchmark to demonstrate the benefits. We need contributions, if only benchmarks, for this issue to make progress.

stdpain · 2023-12-28T07:10:03Z

I'll start by providing a benchmark for this case

stdpain mentioned this issue May 31, 2022

[WIP] Improve the performance of roaring bitmap under sparse cases #364

Closed

stdpain mentioned this issue Dec 28, 2023

add benchmark for sparse cases #533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of roaring bitmap under sparse cases #319

Improve the performance of roaring bitmap under sparse cases #319

stdpain commented Sep 7, 2021

lemire commented Sep 7, 2021

stdpain commented Sep 8, 2021 •

edited

lemire commented Sep 8, 2021

Oppen commented Oct 22, 2021

Oppen commented Apr 26, 2022

stdpain commented May 1, 2022

Oppen commented Aug 11, 2022

stdpain commented Aug 11, 2022

Oppen commented Sep 21, 2022

madscientist commented Nov 20, 2022

josh08287 commented Nov 14, 2023

lemire commented Nov 14, 2023

lemire commented Nov 14, 2023

stdpain commented Dec 28, 2023

Improve the performance of roaring bitmap under sparse cases #319

Improve the performance of roaring bitmap under sparse cases #319

Comments

stdpain commented Sep 7, 2021

lemire commented Sep 7, 2021

stdpain commented Sep 8, 2021 • edited

lemire commented Sep 8, 2021

Oppen commented Oct 22, 2021

Oppen commented Apr 26, 2022

stdpain commented May 1, 2022

Oppen commented Aug 11, 2022

stdpain commented Aug 11, 2022

Oppen commented Sep 21, 2022

madscientist commented Nov 20, 2022

josh08287 commented Nov 14, 2023

lemire commented Nov 14, 2023

lemire commented Nov 14, 2023

stdpain commented Dec 28, 2023

stdpain commented Sep 8, 2021 •

edited