Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeBitmap inserts new values that occur unexpected results when building a RangeBitamp.Appender. #601

Open
Josehokec opened this issue Nov 16, 2022 · 7 comments

Comments

@Josehokec
Copy link

Describe the bug
After using the appender build RangeBitmap, insert new values, then build a RangeBitmap again, there will occur unexpected results.

To Reproduce
**My test file: **

import org.roaringbitmap.RangeBitmap;
import org.roaringbitmap.RoaringBitmap;

public class RangeBitmapTest {

    public static void printRoaringBitmap(String bitmapInfo, RoaringBitmap RBM){
        System.out.printf("\n" + bitmapInfo + " roaring bitmap values: ");
        for(int i : RBM) { System.out.printf("%d ", i); }
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.printf("--------Range Bitmap Experiments1--------");
        RangeBitmap.Appender appender1 = RangeBitmap.appender(5_000L);
        appender1.add(1L);
        appender1.add(2L);
        RangeBitmap oldRangeBitmap = appender1.build();
        RoaringBitmap oldlte5Test = oldRangeBitmap.lte(5);
        printRoaringBitmap("lte5 old", oldlte5Test);

        // insert more numbers
        appender1.add(3L);
        appender1.add(4L);
        appender1.add(5L);
        RangeBitmap newRangeBitmap = appender1.build();
        RoaringBitmap newLte5Test = newRangeBitmap.lte(5);
        printRoaringBitmap("lte5 new", newLte5Test);

        RoaringBitmap newGte3Test = newRangeBitmap.gte(3);
        printRoaringBitmap("gte3 new", newGte3Test);

        RoaringBitmap newGte4Test = newRangeBitmap.gte(4);
        printRoaringBitmap("gte4 new", newGte4Test);

        System.out.printf("\n--------Range Bitmap Experiments2--------");
        RangeBitmap.Appender appender2 = RangeBitmap.appender(5_000L);
        for(long i = 1; i < 6; i++){
            appender2.add(i);
        }
        RangeBitmap test2RangeBitmap = appender2.build();
        RoaringBitmap gte4 = test2RangeBitmap.gte(4);
        printRoaringBitmap("gte4 ", gte4);

        RoaringBitmap lte5 = test2RangeBitmap.lte(5);
        printRoaringBitmap("lte5 ", lte5);
    }
}

These are the output results:

--------Range Bitmap Experiments1--------
lte5 old roaring bitmap values: 0 1
lte5 new roaring bitmap values: 0 1
gte3 new roaring bitmap values: 2 3 4
gte4 new roaring bitmap values: 2 3 4
--------Range Bitmap Experiments2--------
gte4 roaring bitmap values: 3 4
lte5 roaring bitmap values: 0 1 2 3 4

RoaringBitmap version:

0.9.35

Java version:

JDK 11

@Josehokec Josehokec added the bug label Nov 16, 2022
@richardstartin
Copy link
Member

It’s hard to say whether this is a user error or a bad API, therefore whether it’s just “not a bug” or a “feature request” but this isn’t the way the appended was expected to be used. The appended should be cleared by calling clear, and then you should put all the values back in.

However, I think your usage makes sense and could be supported without breaking existing use cases, so I will take a look at making it work the way you expect.

@Josehokec
Copy link
Author

Sincere thanks for your reply, and I'm sorry for marking this question as a bug.

In many cases, there is a need to add new elements again after building RangeBitmap. The default method of building the range bitmap (clearing the appender then putting all the values back in) will have a high construction cost if the appender has previously maintained a lot of data and I just need to add a small number of new elements. Therefore, I am eagerly anticipating the addition of this feature to RangeBitmap.

Sorry to disturb you. Wish you all the best in your work.

@richardstartin
Copy link
Member

Yes, I understand the motivation. It just wasn't relevant to its original use case. I should get around to it within a week or so.

@richardstartin
Copy link
Member

@Josehokec can you describe a concrete use case for this functionality, something more involved than a test program? Why would you want a RangeBitmap for the first N rows as a proper subset of another RangeBitmap for those N rows and then next M rows? Understanding the use case will help motivate supporting this behaviour.

@Josehokec
Copy link
Author

I'm sorry I didn't see this email until now.

Currently, I use RangeBitmap to filter some tuples according to multiple attribute range constraints to speed up the execution of SQL statements (RangeBitmap has excellent query performance). Using RangeBitmap in processing complex query statements can improve performance greatly.

For my database table, there is a need to insert new tuples (although the frequency of inserting elements is very low). However, after calling the $build()$ function, the current appender needs to re-read all tuples and build a RangeBitmap again if it inserts a new tuple. In this way, the IO cost of re-reading all tuples and building a new RangeBitmap is expensive. I need the appender to be friendly to support insert operations after the build.

I'm hoping that after calling the $build()$ function, the appender does not need to be cleared and all tuples re-read; instead, I'll use the add function to add the newly inserted elements to the appender, and then call the $appender.build()$ function again to get a new RangeBitmap.

I apologize for sending this email so late.

@richardstartin
Copy link
Member

richardstartin commented Nov 26, 2022

Essentially you want a way to append to an existing index when you add some rows. So long as it’s only appending, this can be made to work, but it might be better not to make the appender reusable and instead wrap the existing index, e.g.

var appender = Appender.wrap(existingIndex);
values.forEach(appender::add);
var newIndex = appender.build();

Keeping the appender around will hang on to a lot of memory and making the appender work just in case the user wants to reuse it may incur overhead/cause bugs in the existing usage in Apache Pinot.

@lemire lemire added enhancement and removed bug labels Apr 21, 2023
@lemire
Copy link
Member

lemire commented Apr 21, 2023

@richardstartin I have marked this as a possible enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants