RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

Josehokec · 2022-11-16T07:33:25Z

Describe the bug
After using the appender build RangeBitmap, insert new values, then build a RangeBitmap again, there will occur unexpected results.

To Reproduce
**My test file: **

import org.roaringbitmap.RangeBitmap;
import org.roaringbitmap.RoaringBitmap;

public class RangeBitmapTest {

    public static void printRoaringBitmap(String bitmapInfo, RoaringBitmap RBM){
        System.out.printf("\n" + bitmapInfo + " roaring bitmap values: ");
        for(int i : RBM) { System.out.printf("%d ", i); }
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.printf("--------Range Bitmap Experiments1--------");
        RangeBitmap.Appender appender1 = RangeBitmap.appender(5_000L);
        appender1.add(1L);
        appender1.add(2L);
        RangeBitmap oldRangeBitmap = appender1.build();
        RoaringBitmap oldlte5Test = oldRangeBitmap.lte(5);
        printRoaringBitmap("lte5 old", oldlte5Test);

        // insert more numbers
        appender1.add(3L);
        appender1.add(4L);
        appender1.add(5L);
        RangeBitmap newRangeBitmap = appender1.build();
        RoaringBitmap newLte5Test = newRangeBitmap.lte(5);
        printRoaringBitmap("lte5 new", newLte5Test);

        RoaringBitmap newGte3Test = newRangeBitmap.gte(3);
        printRoaringBitmap("gte3 new", newGte3Test);

        RoaringBitmap newGte4Test = newRangeBitmap.gte(4);
        printRoaringBitmap("gte4 new", newGte4Test);

        System.out.printf("\n--------Range Bitmap Experiments2--------");
        RangeBitmap.Appender appender2 = RangeBitmap.appender(5_000L);
        for(long i = 1; i < 6; i++){
            appender2.add(i);
        }
        RangeBitmap test2RangeBitmap = appender2.build();
        RoaringBitmap gte4 = test2RangeBitmap.gte(4);
        printRoaringBitmap("gte4 ", gte4);

        RoaringBitmap lte5 = test2RangeBitmap.lte(5);
        printRoaringBitmap("lte5 ", lte5);
    }
}

These are the output results:

--------Range Bitmap Experiments1--------
lte5 old roaring bitmap values: 0 1
lte5 new roaring bitmap values: 0 1
gte3 new roaring bitmap values: 2 3 4
gte4 new roaring bitmap values: 2 3 4
--------Range Bitmap Experiments2--------
gte4 roaring bitmap values: 3 4
lte5 roaring bitmap values: 0 1 2 3 4

RoaringBitmap version:

0.9.35

Java version:

JDK 11

The text was updated successfully, but these errors were encountered:

richardstartin · 2022-11-16T08:08:32Z

It’s hard to say whether this is a user error or a bad API, therefore whether it’s just “not a bug” or a “feature request” but this isn’t the way the appended was expected to be used. The appended should be cleared by calling clear, and then you should put all the values back in.

However, I think your usage makes sense and could be supported without breaking existing use cases, so I will take a look at making it work the way you expect.

Josehokec · 2022-11-16T08:43:52Z

Sincere thanks for your reply, and I'm sorry for marking this question as a bug.

In many cases, there is a need to add new elements again after building RangeBitmap. The default method of building the range bitmap (clearing the appender then putting all the values back in) will have a high construction cost if the appender has previously maintained a lot of data and I just need to add a small number of new elements. Therefore, I am eagerly anticipating the addition of this feature to RangeBitmap.

Sorry to disturb you. Wish you all the best in your work.

richardstartin · 2022-11-16T09:13:27Z

Yes, I understand the motivation. It just wasn't relevant to its original use case. I should get around to it within a week or so.

richardstartin · 2022-11-19T15:06:36Z

@Josehokec can you describe a concrete use case for this functionality, something more involved than a test program? Why would you want a RangeBitmap for the first N rows as a proper subset of another RangeBitmap for those N rows and then next M rows? Understanding the use case will help motivate supporting this behaviour.

Josehokec · 2022-11-26T05:26:20Z

I'm sorry I didn't see this email until now.

Currently, I use RangeBitmap to filter some tuples according to multiple attribute range constraints to speed up the execution of SQL statements (RangeBitmap has excellent query performance). Using RangeBitmap in processing complex query statements can improve performance greatly.

For my database table, there is a need to insert new tuples (although the frequency of inserting elements is very low). However, after calling the $build()$ function, the current appender needs to re-read all tuples and build a RangeBitmap again if it inserts a new tuple. In this way, the IO cost of re-reading all tuples and building a new RangeBitmap is expensive. I need the appender to be friendly to support insert operations after the build.

I'm hoping that after calling the $build()$ function, the appender does not need to be cleared and all tuples re-read; instead, I'll use the add function to add the newly inserted elements to the appender, and then call the $appender.build()$ function again to get a new RangeBitmap.

I apologize for sending this email so late.

richardstartin · 2022-11-26T14:15:34Z

Essentially you want a way to append to an existing index when you add some rows. So long as it’s only appending, this can be made to work, but it might be better not to make the appender reusable and instead wrap the existing index, e.g.

var appender = Appender.wrap(existingIndex);
values.forEach(appender::add);
var newIndex = appender.build();

Keeping the appender around will hang on to a lot of memory and making the appender work just in case the user wants to reuse it may incur overhead/cause bugs in the existing usage in Apache Pinot.

lemire · 2023-04-21T21:43:25Z

@richardstartin I have marked this as a possible enhancement.

Josehokec added the bug label Nov 16, 2022

lemire added enhancement and removed bug labels Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

Josehokec commented Nov 16, 2022

richardstartin commented Nov 16, 2022

Josehokec commented Nov 16, 2022

richardstartin commented Nov 16, 2022

richardstartin commented Nov 19, 2022

Josehokec commented Nov 26, 2022

richardstartin commented Nov 26, 2022 •

edited

lemire commented Apr 21, 2023

RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

Comments

Josehokec commented Nov 16, 2022

richardstartin commented Nov 16, 2022

Josehokec commented Nov 16, 2022

richardstartin commented Nov 16, 2022

richardstartin commented Nov 19, 2022

Josehokec commented Nov 26, 2022

richardstartin commented Nov 26, 2022 • edited

lemire commented Apr 21, 2023

richardstartin commented Nov 26, 2022 •

edited