New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-2998 native store memory overflow performance #4974
base: develop
Are you sure you want to change the base?
GH-2998 native store memory overflow performance #4974
Conversation
|
|
After
|
@@ -68,52 +69,22 @@ class LmdbSailStore implements SailStore { | |||
private final ValueStore valueStore; | |||
|
|||
private final ExecutorService tripleStoreExecutor = Executors.newCachedThreadPool(); | |||
private final CircularBuffer<Operation> opQueue = new CircularBuffer<>(1024); | |||
private final ArrayBlockingQueue<Operation> opQueue = new ArrayBlockingQueue<>(1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenwenzel I found out that the ArrayBlockingQueue supports all the features that you need here and is more performant with a lot less busy waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be wonderful. Do you have a benchmark for comparing this change?
I've switched to the custom buffer as it was much faster than blocking queues - at least in our write benchmarks.
LMDBBefore
After
|
That means we should just stay with the CircularBuffer? |
Ahh. That's ops per second and not ms/op which I use in my benchmarks. |
@kenwenzel IntelliJ is saying that there is a non atomic update to an atomic field. In general I'm a bit sceptical if this is even thread safe. If two threads both call add and proceed to |
@kenwenzel IntelliJ is saying that there is a non atomic update to an atomic field. In general I'm a bit sceptical if this is even thread safe. If two threads both call add and proceed to `elements[tail] = element` then one thread will overwrite the other. Is it supposed to be thread safe? In this specific case |
72de6f1
to
6668015
Compare
@hmottestad I would propose to keep the CircularBuffer for the LmdbStore. |
Yeah. I thought the same. I reverted it. I also reverted my attempt at throttling new transactions while a lot of write transactions are running. I'll try running the benchmarks again and see if things are better. The biggest improvement is still included, supporting addAll through the entire stack from the Changeset through to the store. |
4b15ecd
to
e60d672
Compare
LMDBBefore
After
@kenwenzel That's much better isn't it? Can you review my proposed changes? |
Looks impressive - especially for larger transactions..Cool :-) |
@kenwenzel I managed to get this error when running one of my benchmarks. |
I'll check this. Maybe gcIds is called from the wrong thread. |
OK, I think I found it. The method resizeMap expects always a surrounding readTransaction which is not the case for gcIds. |
I've tried about every permutation of the following
I've tried moving the write lock into the first "if" clause. I've tried == instead of !=. And all combinations of those. Either I end up with a deadlock or I end up with an exception. |
I suppose you also did the same for re-enabling the read-lock some lines below? |
I've pushed a commit that hopefully fixes the problem. |
689c0bb
to
7694fc9
Compare
I've managed to trigger the following error:
I've also seen that the first overflow to disk for NativeStore starts after the last output of
|
Bummer. I had seen that before but thought I had fixed it. Probably need to figure out how to reproduce it a bit more reliably. As for the point where things overflow. What I've noticed it that when there are a lot of transactions that add and remove data the transaction isolation accumulates layers of Changeset objects. Once the last transaction finishes we get a cascade of Changesets that compress by merging the top Changeset into the one below it in a loop until the final Changeset flushes to the native store. During the compression the models within the Changeset grow larger and larger for each merged Changeset. So it quickly leads to a memory overflow. |
I'm also experimenting with using only the GC load for finding the right point for overflowing to disk: This also works somehow but sometimes the trigger comes too late and the GC takes a very long time. |
That's really cool. I didn't know about the notification system for GCs. Maybe you can try to accumulate how much time is spent in garbage collection over the past 10 seconds and see if it's more than five seconds in total and that would be considered high memory usage. G1 prefers frequent and short GCs rather than long ones, so that might be why it's sometimes triggering too late. |
@hmottestad Maybe you can test the following branch by running It includes your above suggestion for computing the GC load and also includes the proactive closing of models. All optimisations are useless if we can't close the temporary models early. The running time on my computer is much slower than yours:
|
I've also ported the logic on top of your work within the following branch: The following is an excerpt from: List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
RuntimeMXBean runtimeMXBean = ManagementFactory.getRuntimeMXBean();
for (GarbageCollectorMXBean gcBean : gcBeans) {
NotificationEmitter emitter = (NotificationEmitter) gcBean;
emitter.addNotificationListener((notification, o) -> {
long uptimeInMillis = runtimeMXBean.getUptime();
while (! gcInfos.isEmpty()) {
if (uptimeInMillis - gcInfos.peek().getEndTime() > 5000) {
gcSum -= gcInfos.poll().getDuration();
} else {
break;
}
}
// extract garbage collection information from notification.
GarbageCollectionNotificationInfo gcNotificationInfo = GarbageCollectionNotificationInfo.from((CompositeData) notification.getUserData());
GcInfo gcInfo = gcNotificationInfo.getGcInfo();
gcInfos.add(gcInfo);
gcSum += gcInfo.getDuration();
if (System.currentTimeMillis() - lastGcUpdate > 10000) {
overflow = false;
}
if (! overflow) {
// maximum heap size the JVM can allocate
long maxMemory = RUNTIME.maxMemory();
// total currently allocated JVM memory
long totalMemory = RUNTIME.totalMemory();
// amount of memory free in the currently allocated JVM memory
long freeMemory = RUNTIME.freeMemory();
// estimated memory used
long used = totalMemory - freeMemory;
// amount of memory the JVM can still allocate from the OS (upper boundary is the max heap)
long freeToAllocateMemory = maxMemory - used;
// try to prevent OOM
overflow = freeToAllocateMemory < MIN_AVAILABLE_MEM_BEFORE_OVERFLOWING;
lastGcUpdate = System.currentTimeMillis();
}
if (! overflow && gcSum > 2500) {
overflow = true;
lastGcUpdate = System.currentTimeMillis();
}
}, null, null);
} It uses a combination of available memory and GC load and it also seems to work without explicitly closing the temporary models. |
I couldn't get your branch to work, but I made some modifications:
Essentially I'm tracking the total time spent in GC over the past second in a sliding window made of 100 buckets of 10 ms each. That brings it down to 18785.157 ms/op But it's really inconsistent. From one benchmark run to another it'll will be fine, and then suddenly it starts running out of memory. I also still get the inconsistent state exception, even though everything is wrapped in synchronized. That one is really bizarre. |
Maybe you can also try the other branch from my previous message. It combines the GC load with the memory usage to find a good moment for overflowing. |
Btw. The running time will heavily depend on the speed of your SSD and the operating system that you are running. I'm on a MacBook and MacOS is notorious for ignoring fsync requests. If you are using Linux then it's much more likely that it will actually wait for fsync to complete. |
It's Ubuntu in a VirtualBox under Windows 🙈 |
I assume that VirtualBox is probably limiting your IO speed. |
Btw. I'm getting this exception here every once in a while:
|
I'm also getting GC overhead exception even though there is plenty of free memory:
|
I think this is the same error as the weird problem with the NativeStore. IO operations fail for some reason but are somehow swallowed in NativeStore. |
It seems to happen inside the new ID GC code where you fixed the lock issue by adding a read transaction. |
Yes, that's correct. The question is: Does it always happen if resizeMap is required within gcIds or only from time to time? |
@hmottestad I can't reproduce it with my branch Maybe there is some LMDB modification operation that results in an error like |
I reverted the changes you did to fix the read lock issue and added a try catch rollback to my benchmark code instead. I'm still running into this MDB_BAD_TXN issue. There also doesn't seem to be any way to handle the MDB_BAD_TXN issue, since even calling a rollback afterwards causes the entire JVM to exit with 134. |
@kenwenzel I have come to the conclusion that the ParallelGC is very hard to deal with when trying to predict that we are running low on memory. When I switched to G1GC things became a lot more consistent and I'm able to consistently overflow to disk before running out of memory. I also tried to increase the number of transactions in my benchmark and the performance scaled quite well and without running out of memory. |
The wrapping I've seen some places where Furthermore, we could also move |
I wrapped everything i could in E(...) and now the MDB_BAD_TXN doesn't happen anymore :) I probably wrapped more than is needed, but at least we now know that that is what the problem is. |
Thank you! A small note from my side: I would propose to not wrap the _get.methods as it will have a performance impact and the return values are already tested. But checking modifications (put, delete) is really good! |
4dfd13b
to
c9734dc
Compare
@@ -953,8 +958,10 @@ protected void deleteValueToIdMappings(MemoryStack stack, long txn, Collection<L | |||
for (Long id : ids) { | |||
idVal.mv_data(id2data(idBb.clear(), id).flip()); | |||
// id must not have a reference count and must have an existing value | |||
if (mdb_get(writeTxn, refCountsDbi, idVal, ignoreVal) != 0 && | |||
mdb_get(txn, dbi, idVal, dataVal) == 0) { | |||
int a = E(mdb_get(writeTxn, refCountsDbi, idVal, ignoreVal)); // this is where I get MDB_BAD_TXN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenwenzel I fairly consistently get MDB_BAD_TXN here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be due to a previous map full. Can you try to move resizeMap
into the inner loop of gcIds
?
GitHub issue resolved: #2998
Briefly describe the changes proposed in this PR:
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)