Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the implementation of IMemoryManager to allow users to pool char[] and long[] arrays #540

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alimans3
Copy link

SUMMARY

Inject the IMemoryManager instance whenever resources are created, and add close() methods to free allocated resources.
The default implementation changes nothing with how the library works, but users can implement their own manager which pools chars and longs.

Most changes are simple by just adding the memoryManager.

Discussion Thread: https://groups.google.com/g/roaring-bitmaps/c/V4-lRjraMJ0

Automated Checks

  • I have run ./gradlew test and made sure that my PR does not break any unit test.
  • I have run ./gradlew checkstyleMain or the equivalent and corrected the formatting warnings reported.

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance, this makes every container and bitmap 8 bytes larger in memory. Which is a problem in itself. I want to look at this in more detail.

import org.roaringbitmap.buffer.MappeableContainer;
import org.roaringbitmap.buffer.MappeableRunContainer;
import org.roaringbitmap.memory.DefaultMemoryManager;
import org.roaringbitmap.memory.IMemoryManager;

/**
* A 2D bitmap which associates values with a row index and can perform range queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t want this class to be modified.

/**
* Closes the containers used and frees the arrays used by them
*/
public void close() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freeing the memory seems to depend on this method being called. There is a lot of code out there which will need to be migrated to ensure the memory is freed. This is a big red flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the default implementation doesn’t retain references, so this won’t cause memory leaks in existing applications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is that I didn’t change the public API and existing users won’t need to change anything as long as they don’t benefit from the custom memory management

* @param memoryManager the memory manager
*/
public RoaringBitmap(IMemoryManager memoryManager) {
highLowContainer = new RoaringArray(memoryManager);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get freed? Whatever the mechanism, how is this ensures for existing code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is freed in the clear() method, maybe we can make RoaringBitmap closable so that it frees the resources

* Default simple implementation for IMemoryManager where create methods creates the char/long array
* and the free method does nothing.
*/
public final class DefaultMemoryManager implements IMemoryManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents leaks from occurring in existing code but referencing it throughout the data structure still imposes a footprint overhead on existing users.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to measure the footprint? Other than the obvious 8 bytes per container/bitmap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JOL reports the class layout, it should be measured with and without compressed references.

@richardstartin
Copy link
Member

I read the email exchange. Have you considered memory mapping your bitmaps instead?

@lemire
Copy link
Member

lemire commented Nov 25, 2021

It would be important to show that this change comes at no cost for people who do not make use of this feature. So it should not increase memory usage or otherwise slow the code down.

It will surely make the code harder to maintain and more error prone. To compensate for this downside, we would need to be able to show better performance in at least some instances.

If these conditions are met, then we can possibly merge the PR if there is wide agreement in the community.

However, if these conditions are not met, you can still use your approach but as part of a fork of the project. We do not encourage forks… but if the changes only benefit you, then it is the best option for everyone.

@lemire
Copy link
Member

lemire commented Nov 25, 2021

It is really important for us to avoid regressions both in terms of performance and also in correctness.

@alimans3
Copy link
Author

alimans3 commented Nov 25, 2021

It would be important to show that this change comes at no cost for people who do not make use of this feature. So it should not increase memory usage or otherwise slow the code down.

It will surely make the code harder to maintain and more error prone. To compensate for this downside, we would need to be able to show better performance in at least some instances.

If these conditions are met, then we can possibly merge the PR if there is wide agreement in the community.

However, if these conditions are not met, you can still use your approach but as part of a fork of the project. We do not encourage forks… but if the changes only benefit you, then it is the best option for everyone.

I would need your help with that.
How do you think we should move forward, in terms of performance, theoretically it should not cause a regression, and the same applies on correctness.
The only footprint this adds is the memory added for references.
How about we create a CustomMemoryManager that pools the objects by default, and let the user decide if he wants to use it.
This should reduce significantly the temporary objects created if the user opts to use the new manager.
I can try it but I need a way to measure the effect.
How do you usually do this?

@richardstartin
Copy link
Member

richardstartin commented Nov 25, 2021

My concern with this implementation is the footprint overhead, which is up to 0.5MB for 32 bit bitmaps. This imposes bloat on every user, whether DIY memory management seems like a good idea to them or not. The cause of the bloat is the dependency injection style implementation.

What could work is an SPI style memory manager, from which all arrays are obtained. If the memory manager is stored in a static final field (with registration possible before first use) then the calls will all get inlined anyway and there will be no cost at all. Container, RoaringArray, and RoaringBitmap become AutoCloseable, and arrays are returned to the memory manager on calling close, or at resizes.

This has shortcomings - if two libraries in the same process want to use RoaringBitmap with different memory managers, they will not be able to: one of them will register their memory manager via the SPI first and the other will lose. This is unlikely though because the vast majority of users will not want custom memory management, and getting two which do in the same process is going to be rare… JVMs already have memory management in the form of TLABs and garbage collection, which in recent times will outperform DIY attempts. But it would be possible for a user to override the behaviour at no cost to other users this way, if they want to.

Features not imposing costs on users who don’t use the feature has always been a design principle in this library as things have been added, and I think it’s an important one.

@lemire
Copy link
Member

lemire commented Nov 25, 2021

Another very important step would be to demonstrate that you can outdo the Java memory manager in a realistic setting. As I made clear, this is not at all obvious. There are many sophisticated knobs to the memory managers, with respect to how you want them to work. You have to show that you can beat Java at its own game, with the best settings.

It is not at all obvious to me how you can do better by pooling char[] and long[] arrays in a realistic setting. The JVM has far more options and visibility than you do as a Java programmer. Beating it is hard. Memory allocation must also be thread safe in general, so this requires locking in many realistic applications (whereas the JVM can almost surely rely on lightweight concurrency).

This should reduce significantly the temporary objects created if the user opts to use the new manager.

It does not follow that you have better performance. You have to show that your application runs faster. You can, and probably should, write a benchmark using JMH.

Java is very well tuned to dealing with lots of temporary objects. If you reduce temporary objects while incurring no other cost, then you win, surely, but if you work hard to reduce temporary objects, you may very well end with a negative because you are competing against what the JVM does best.

Please be mindful that you should be able to win by a sizeable margin to justify the added complexity. E.g., you should be able to show that a reasonable realistic task can be achieved, say, at least 20% faster (with hard number).

@@ -180,10 +187,15 @@ public static RoaringBitmap or(RoaringBitmap... bitmaps) {
* @return the symmetric difference of the bitmaps
*/
public static RoaringBitmap xor(RoaringBitmap... bitmaps) {
IMemoryManager memoryManager =
bitmaps.length == 0 ? DefaultMemoryManager.INSTANCE : bitmaps[0].memoryManager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if empty, returns an empty bitmap (-> no memorymanager should be involved).
if not empty, the Collector should choose dynamically the proper memorymanager. As the first bitmap may be a dumb/empty RoaringBitmap with a DefaultMemoryManager.INSTANCE, while other non-empty bitmaps could have a custom memoryManager.

@blacelle
Copy link
Member

blacelle commented Nov 25, 2021

I also feel the added complexity might not surpass the yet-to-confirm benefits. However, the spirit/initial motivation looks positive, and the PR generous. Maybe we could consider dropping some final keywords (e.g. to enable extending ArrayContainer), and wrap some code in dedicated methods (e.g. char[] allocations). It would enable such (and others) feature(s) without impacting core implementations.

The perf/memory impact of derivation might impact only these custom implementations (or not, e.g. I'm not telling wrapping char[] allocation in a non-final non-private method is free).

One would then offer such specific features through additional classes. We may accept them, or accumulate in some extras jar (as discussed previously), or prefer parking into an additional specific module, or in the examples, or suggest creating a specific library (on top of RoaringBitmap, instead of forking).

Indeed, if @alimans3 find any concrete benefit with his approach, we could ensure it can be applied without requiring him to fork RoaringBitmap core classes. Hence, this PR may also turn into the minimal changes to make his approach (and others) applicable, without requiring duplicating full-classes/full-methods.

@richardstartin
Copy link
Member

I just spent a little bit of time playing with an alternative, and wrote a reference counting test to make sure everything allocated is freed, and making Container closable will be extremely buggy. Containers are created in an ad hoc fashion, transitioned between in an unpredictable manner so being able to close every created container so it can release its memory is going to be fraught... There is finalize but let's not go there...

@lemire
Copy link
Member

lemire commented Nov 25, 2021

Additional classes would be fine. I also do not discourage this PR: see my comments on the mailing list.

@richardstartin
Copy link
Member

richardstartin commented Nov 25, 2021

It's also not something you can hope to do reliably without the kind of instrumentation points the GC has access to... Consider the case here where a bit is added to an ArrayContainer:

    public Container add(int begin, int end) {
    ...
       if (newcardinality > DEFAULT_MAX_SIZE) {
         BitmapContainer a = this.toBitmapContainer();
         return a.iadd(begin, end);
       }
    ...

There are a few ways you can approach this. You can call this.close() before returning, but that would be wrong if this is stored in an array somewhere after creating the new bitmap container. What you actually need to do is intercept every single reference assignment to determine if the old referent is now dereferenced so it can be closed. This is what GC engineers call a store barrier, and this way lies madness.

@richardstartin
Copy link
Member

I also do not discourage this PR: see my comments on the mailing list.

I have discouraged this PR not because I don't appreciate the effort that has gone in to it, I do, I just can't see it working and my concerns go way deeper than the footprint overhead now.

@richardstartin
Copy link
Member

If anyone is interested, my code is here https://github.com/RoaringBitmap/RoaringBitmap/tree/memory-manager-prototype

@lemire
Copy link
Member

lemire commented Nov 25, 2021

@richardstartin I think @blacelle refers to specialized classes. I agree that re-engineering everything seems unwise.

@richardstartin
Copy link
Member

richardstartin commented Nov 25, 2021

@richardstartin I think @blacelle refers to specialized classes. I agree that re-engineering everything seems unwise.

@lemire you still can't guarantee to close every container without intercepting reference assignment (store barriers), custom classes or not.

@richardstartin
Copy link
Member

I may have overstated my case a little bit, there are relatively few container assignments in RoaringArray and it's possible to route all of them through setContainerAtIndex - see #541

@blacelle
Copy link
Member

@richardstartin I think @blacelle refers to specialized classes.

Yep.

@alimans3
Copy link
Author

I may have overstated my case a little bit, there are relatively few container assignments in RoaringArray and it's possible to route all of them through setContainerAtIndex - see #541

I took a look at your PR, the approach you did is definitely better than what I started with.
The overhead is removed and the Allocator static methods are clearer and less infiltrating in the code.

Regarding the case you mentioned:

public Container add(int begin, int end) {
...
   if (newcardinality > DEFAULT_MAX_SIZE) {
     BitmapContainer a = this.toBitmapContainer();
     return a.iadd(begin, end);
   }
...

In this case the caller should be responsible of closing the old container if the new one replaces it.

@richardstartin
Copy link
Member

Regarding the case you mentioned:
...
In this case the caller should be responsible of closing the old container if the new one replaces it.

The thing is that's easier said than done, it's really hard to guarantee. There's an awful lot of code where containers do not have well defined scopes. Take RunContainer.or and find all the places a container needs to be closed:

  @Override
  public Container or(BitmapContainer x) {
    if (isFull()) {
      return full();
    }
    // could be implemented as return toTemporaryBitmap().ior(x);
    BitmapContainer answer = x.clone();
    for (int rlepos = 0; rlepos < this.nbrruns; ++rlepos) {
      int start = (this.getValue(rlepos));
      int end = start + (this.getLength(rlepos)) + 1;
      int prevOnesInRange = answer.cardinalityInRange(start, end);
      Util.setBitmapRange(answer.bitmap, start, end);
      answer.updateCardinality(prevOnesInRange, end - start);
    }
    if (answer.isFull()) {
      // need to close answer here
      return full();
    }
    return answer;
  }

  @Override
  public Container or(RunContainer x) {
    if (isFull()) {
      return full();
    }
    if (x.isFull()) {
      return full(); // cheap case that can save a lot of computation
    }
    // we really ought to optimize the rest of the code for the frequent case where there is a
    // single run
    RunContainer answer = new RunContainer(allocateChars(2 * (this.nbrruns + x.nbrruns)), 0);
    int rlepos = 0;
    int xrlepos = 0;

    while ((xrlepos < x.nbrruns) && (rlepos < this.nbrruns)) {
      if (getValue(rlepos) - x.getValue(xrlepos) <= 0) {
        answer.smartAppend(getValue(rlepos), getLength(rlepos));
        rlepos++;
      } else {
        answer.smartAppend(x.getValue(xrlepos), x.getLength(xrlepos));
        xrlepos++;
      }
    }
    while (xrlepos < x.nbrruns) {
      answer.smartAppend(x.getValue(xrlepos), x.getLength(xrlepos));
      xrlepos++;
    }
    while (rlepos < this.nbrruns) {
      answer.smartAppend(getValue(rlepos), getLength(rlepos));
      rlepos++;
    }
    if (answer.isFull()) {
      // need to close answer here
      return full();
    }
    // may need to close answer here, because it may not reach the caller.
    return answer.toBitmapIfNeeded();
  }

I think if you add a reference counting test on this branch, you will realise how hard it is to retrofit this behaviour. The same goes for subclassing - there's a lot of places the subclass needs to be preserved through combining operations, some of which escape, some don't. The only surefire way to do this is to have the same store barriers the GC has.

@alimans3
Copy link
Author

Regarding the case you mentioned:
...
In this case the caller should be responsible of closing the old container if the new one replaces it.

The thing is that's easier said than done, it's really hard to guarantee. There's an awful lot of code where containers do not have well defined scopes. Take RunContainer.or and find all the places a container needs to be closed:

  @Override
  public Container or(BitmapContainer x) {
    if (isFull()) {
      return full();
    }
    // could be implemented as return toTemporaryBitmap().ior(x);
    BitmapContainer answer = x.clone();
    for (int rlepos = 0; rlepos < this.nbrruns; ++rlepos) {
      int start = (this.getValue(rlepos));
      int end = start + (this.getLength(rlepos)) + 1;
      int prevOnesInRange = answer.cardinalityInRange(start, end);
      Util.setBitmapRange(answer.bitmap, start, end);
      answer.updateCardinality(prevOnesInRange, end - start);
    }
    if (answer.isFull()) {
      // need to close answer here
      return full();
    }
    return answer;
  }

  @Override
  public Container or(RunContainer x) {
    if (isFull()) {
      return full();
    }
    if (x.isFull()) {
      return full(); // cheap case that can save a lot of computation
    }
    // we really ought to optimize the rest of the code for the frequent case where there is a
    // single run
    RunContainer answer = new RunContainer(allocateChars(2 * (this.nbrruns + x.nbrruns)), 0);
    int rlepos = 0;
    int xrlepos = 0;

    while ((xrlepos < x.nbrruns) && (rlepos < this.nbrruns)) {
      if (getValue(rlepos) - x.getValue(xrlepos) <= 0) {
        answer.smartAppend(getValue(rlepos), getLength(rlepos));
        rlepos++;
      } else {
        answer.smartAppend(x.getValue(xrlepos), x.getLength(xrlepos));
        xrlepos++;
      }
    }
    while (xrlepos < x.nbrruns) {
      answer.smartAppend(x.getValue(xrlepos), x.getLength(xrlepos));
      xrlepos++;
    }
    while (rlepos < this.nbrruns) {
      answer.smartAppend(getValue(rlepos), getLength(rlepos));
      rlepos++;
    }
    if (answer.isFull()) {
      // need to close answer here
      return full();
    }
    // may need to close answer here, because it may not reach the caller.
    return answer.toBitmapIfNeeded();
  }

I think if you add a reference counting test on this branch, you will realise how hard it is to retrofit this behaviour. The same goes for subclassing - there's a lot of places the subclass needs to be preserved through combining operations, some of which escape, some don't. The only surefire way to do this is to have the same store barriers the GC has.

You’re right, I think its hard to count all these possible cases.
However, failure to close (free) containers would result in it being collected by the GC (unless referenced by custom allocator which is counterintuitive), so the worst case would be having the same footprint of the current implementation.
And with time, the code should evolve to cover all the missing cases when the Allocator approach is adopted.

What do you think?

@lemire
Copy link
Member

lemire commented Nov 26, 2021

However, failure to close (free) containers would result in it being collected by the GC (unless referenced by custom allocator which is counterintuitive), so the worst case would be having the same footprint of the current implementation.

I don’t think we could merge this PR without a custom allocator that shows the utility of this strategy. And said allocator would have to be correct (no leak).

@alimans3
Copy link
Author

However, failure to close (free) containers would result in it being collected by the GC (unless referenced by custom allocator which is counterintuitive), so the worst case would be having the same footprint of the current implementation.

I don’t think we could merge this PR without a custom allocator that shows the utility of this strategy. And said allocator would have to be correct (no leak).

I will work on a benchmark that uses a custom allocator that pools objects to show the temporary allocations gain, once @richardstartin finalizes the other PR which is the dominant PR now.

@lemire
Copy link
Member

lemire commented Nov 26, 2021

Let me be clearer… We need a fully worked out, tested and benchmarked solution. We will not merge into our main branch something that it is not fully motivated.

Our core code is mostly bug free and is performant. People rely on it. We will not take chances.

I am not discouraging this work but I want us to understand that experiments must be validated before they get merged and published. Engineers are relying on us for their production code.

If the code becomes more complicated then we need a tangible benefit. So we must see the code run faster!

@richardstartin
Copy link
Member

@alimans3 I'm less concerned by leaks than use after free scenarios, but if you can't avoid guarantee leaks don't happen it's unlikely you can guarantee use after free doesn't. It's very easy to underestimate the complexity of doing this sort of thing properly, and the consequences of getting it wrong tend to be fairly severe.

@alimans3
Copy link
Author

alimans3 commented Dec 1, 2021

@richardstartin @lemire @blacelle

I added a very simple, non-threadsafe, implementation of the allocator just to show the potential of this development.
I don't think we should provide the custom allocator implementation, this is just to be able to demo.
In my project code, We use a similar pool design, however We made it thread safe in a CPU efficient way.

I added a benchmark with some operations on random bitmaps and here are the results:
before.log
after.log

I ran the benchmarks with -prof gc to be able to get GC data, and you can find the obvious reduction in GC pause count, GC allocation rate, etc.

Let me know what you think.

@richardstartin
Copy link
Member

richardstartin commented Dec 1, 2021

@alimans3 that's more than a 10% regression on intersection speed. I don't think it's a good tradeoff given that JVMs are designed to allocate quickly and GCs are getting better and better at recycling allocated memory with low pause times.

@alimans3
Copy link
Author

alimans3 commented Dec 1, 2021

@alimans3 that's more than a 10% regression on intersection speed. I don't think it's a good tradeoff given that JVMs are designed to allocate quickly and GCs are getting better and better and recycling allocated memory with low pause times.

This regression is a result of the implementation of the pool which i think is far from perfect.

I think it would be good if we give the user the ability to do this tradeoff if his application is memory-critical.

Even if the GC is becoming better, keeping on creating temporary objects, with large sizes in heavy applications, would reduce its performance.

in my application for example, long GC pauses and even application halts were encountered and this massively reduces it.

@blacelle
Copy link
Member

blacelle commented Dec 1, 2021

I understand the custom IMemorymanager leads to a 10% perf degradation (91 -> 100), but to better-looking GC statistics (.count, .time, @alimans3 I feel you should also focus on gc.time.max statistics). Hence, this implementation would enable a seemingly nice trade-off, with limited perf-impact.

One of the big question remains: how does the implementation with default IMemoryManager behave compared to current implementation: @alimans3 , can you provide figure from your JMH run with the default/dummy allocator, and with master code? (the point being to confirm/infirm default implementation is impact-less for the vast majority of library usage.)

@alimans3
Copy link
Author

alimans3 commented Dec 1, 2021

I understand the custom IMemorymanager leads to a 10% perf degradation (91 -> 100), but to better-looking GC statistics (.count, .time, @alimans3 I feel you should also focus on gc.time.max statistics). Hence, this implementation would also a trade-off one might be interested in, with limited perf-impact.

One of the big question remains: how does the implementation with default IMemoryManager behave compared to current implementation: @alimans3 , can you provide figure from your JMH run with the default/dummy allocator, and with master code?

Sure, will be doing that later.

@richardstartin
Copy link
Member

This regression is a result of the implementation of the pool which i think is far from perfect.

Could you post benchmarks with a pool implementation which outperforms the GC then please? This library is widely used and performance regressions can't be accepted for dubious benefit.

@alimans3
Copy link
Author

alimans3 commented Dec 1, 2021

This regression is a result of the implementation of the pool which i think is far from perfect.

Could you post benchmarks with a pool implementation which outperforms the GC then please? This library is widely used and performance regressions can't be accepted for dubious benefit.

I am not for making this or any pool implementation default nor adding it as an option.

I am only for giving the users the option of doing their own thing.

The implementation is just for the sake of showing that it can be beneficial for some users to have this option. And the memory enhancements vs low perf impact could be a tradeoff someone is willing to take.

@alimans3
Copy link
Author

alimans3 commented Dec 1, 2021

I understand the custom IMemorymanager leads to a 10% perf degradation (91 -> 100), but to better-looking GC statistics (.count, .time, @alimans3 I feel you should also focus on gc.time.max statistics). Hence, this implementation would enable a seemingly nice trade-off, with limited perf-impact.

One of the big question remains: how does the implementation with default IMemoryManager behave compared to current implementation: @alimans3 , can you provide figure from your JMH run with the default/dummy allocator, and with master code? (the point being to confirm/infirm default implementation is impact-less for the vast majority of library usage.)

I attached the results of the same benchmark run on master branch.
The results are similar in terms of performance.
For GC time, it seems it is not consistent but the GC count number is similar.
master.log

@blacelle
Copy link
Member

blacelle commented Dec 1, 2021

I don't think we should provide the custom allocator implementation, this is just to be able to demo.

This appeared legitimate to me at the beginning. However, such a custom allocator would help demonstrating the benefit of your proposition, and how the community would benefit from it. Else, it feels like we are complexifying the code for the benefit of your own project, with no immediate benefit to others (i.e. as we all agree writing+testing such a custom allocator is all but trivial).

Again, we may prefer focusing in making such a development easier (e.g. removing some final keywords, wrapping array allocations in overridables methods), but not injecting the whole concept of IAllocationManager in the core classes.

Comment on lines -193 to +209
return or(x.not(0, endOfRange).iremove(endOfRange, 0x10000));
Container not = x.not(0, endOfRange);
Container removed = not.iremove(endOfRange, 0x10000);
if (not != removed) {
not.close();
}
Container answer = or(removed);
if (answer != removed) {
removed.close();
}
return answer;
}
Container not = x.not(0, 0x10000);
Container answer = or(not);
if (not != answer) {
not.close();
}
return or(x.not(0, 0x10000));
return answer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff sums up the complexity tradeoff for me, it may even mess with inlining by increasing the code size (fortunately it's not a widely used method)

@blacelle
Copy link
Member

blacelle commented Dec 2, 2021

I had a try with your branch @alimans3 , forced the use of your custom allocator and ran tests: KO. I switch some Map into a ConcurrentHashMap to help thread-safety. But TestFastAggregation.testAndWithIterator (amongst others) seems to demonstrate some data corruption. Your custom allocator is buggy? Please demonstrate the whole thing can work, providing a not_naive allocator. Its performance may be slightly worst than with naive_allocator (which may be OK for you as lomng as GC pauses are shorter), it is OK as long as the naive allocator would have no impact (for 99.9% of RoaringBitmaps users).

-> In addition to discussed performance elements, please provide a rock-solid PR, with a not naive allocator executed through all tests, to demonstrate the whole thing is achievable.

-> I believe we may not able to achieve a not_naive+not-very_slow+working_for_everybody allocator, but I feel you may achieve something matching your constrains and your executions pathes @alimans3

@richardstartin
Copy link
Member

In addition to discussed performance elements, please provide a rock-solid PR, with a not naive allocator executed through all tests, to demonstrate the whole thing is achievable.

this is obviously achievable with a new test set, but it’s quite an undertaking for one simple reason: RoaringBitmap used not to be closeable, so the tests don’t close the bitmaps. Without closing the bitmaps, you don’t get recycling.

My tests for my prototype indicated that there were use after free bugs because not all of the places a container should be closed (in thousands of lines of code written over many years) were identified, and that there were double closes, so there was an expectation on my part that if that work got picked up, the missing/double closes would be identified as a prerequisite to consideration to merging.

@alimans3
Copy link
Author

alimans3 commented Dec 2, 2021

Okay so as a summary:

1- We need to provide a custom allocator to show the community the way to use it (TODO: Enhance allocator for thread-safety)
2- This is definitely not a rock-solid PR, it is just a demo of value, I will be adding tests, and validating all tests can run with the custom allocator (TODO: Validate all tests with new allocator)
3- We need to identify 2 scenarios:

Scenario A:
Missing free: harmless but we should find all places where this is the case

Scenario B:
Use after free: this is the buggy scenario, we need to find those places and I think running the tests with the custom allocator will help with that.

(TODO: Identify cases where these scenarios occur and fix them)

Before I go on with these TODOs, I think we need to have a consensus on whether we want to move forward with this option or not now that its potential is clear.

@richardstartin @blacelle @lemire let me know what you think.

@lemire
Copy link
Member

lemire commented Dec 2, 2021

I can maybe clarify the 'governance' part. I pretty much always take the same stance on these matters and I think it is a view that has wide support. Hopefully, you will approve too. Overall, we take a conservative stance. So if there is going to be disagreement over a pull request, then the pull request is not merged. I think that we all appreciate that you are producing high-quality work and that you have put a lot work on the current (draft) PR. You are obviously a capable programmer. I view your proposal as a research project. There is a lot of risk involved. The project cannot take this risk: you have to carry the burden of the proof. Sadly, I think it needs to be flushed out quite a bit before people can properly assess it.

So yes, this means that you could work for weeks, build a lot of code, produce a lot of benchmarks, and we could decide not to merge the PR. It follows because your proposal is risky.

@blacelle
Copy link
Member

blacelle commented Dec 2, 2021

this is obviously achievable with a new test set,

I feel this is not obvious while not breaking the not_very_slow constrain.

Without closing the bitmaps, you don’t get recycling.

Yep. Hence, any way, maybe we should not spent this much energy into this PR until the whople work is done. We can try discouraging @alimans3 before we've done the work. Or let him do the work, and either accept this is not feasible in a for-all-users (as the simplest not naive allocator would give poor result), or come to us with a good implementation, still leaving options on the table (like keeping such a code in an extras module).

Missing free: harmless but we should find all places where this is the case

Not harmeless, as your PR invites people to use a custom allocator, and these people would encounter memory-leaks/bugs.

Before I go on with these TODOs, I think we need to have a consensus on whether we want to move forward with this option or not now that its potential is clear.

The consensus seems to be we're not convinced, for various reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants