Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure CME does not happen when (un)registering commands on Paper during server runtime #501

Draft
wants to merge 2 commits into
base: dev/dev
Choose a base branch
from

Conversation

willkroboth
Copy link
Collaborator

@willkroboth willkroboth commented Oct 21, 2023

This PR fixes #494

PaperMC/Paper#3116 added a patch to Paper that causes the Commands packet to be built asynchronously. Specifically, they added the following ThreadPoolExector to the net.minecraft.commands.Commands class and used it to move some of the work off the main thread.

public static final java.util.concurrent.ThreadPoolExecutor COMMAND_SENDING_POOL = new java.util.concurrent.ThreadPoolExecutor(
    0, 2, 60L, java.util.concurrent.TimeUnit.SECONDS,
    new java.util.concurrent.LinkedBlockingQueue<>(),
    new com.google.common.util.concurrent.ThreadFactoryBuilder()
        .setNameFormat("Paper Async Command Builder Thread Pool - %1$d")
        .setUncaughtExceptionHandler(new net.minecraft.DefaultUncaughtExceptionHandlerWithName(net.minecraft.server.MinecraftServer.LOGGER))
        .build(),
    new java.util.concurrent.ThreadPoolExecutor.DiscardPolicy()
);

Previously, if this thread pool happened to be looping through the Brigadier Vanilla or Resources CommandDispatcher and the CommandAPI registered or unregistered a command, a ConcurrentModificationException could occur.

This PR fixes this by adding a modifyCommandTreesSafely method to PaperImplementations. Anytime CommandAPIBukkit needs to modify the command trees, it calls this method. If the COMMAND_SENDING_POOL is present, the modification task will be submitted to the pool, and the current thread will be blocked until the task is completed. If the thread pool is not present, the task is run immediately like before.


I chose this solution because I noticed that the COMMAND_SENDING_POOL seemed to only run one task at a time. So, if you put the modify task into the pool, you can ensure that the pool isn't running another task, and no CME occurs.

However, the fact that the pool only ran one task at a time struck me as odd. The javadocs for the ThreadPoolExecutor constructor indicate the second parameter — 2 — is the maximum number of threads allowed in the pool. Therefore, I expected that at most 2 tasks could be run at the same time in this pool, potentially allowing a modify task and a command build task to run at the same time, causing a CME.

In practice, I only saw this pool running at most 1 task. I don't really understand what's going on, so it seems possible that this PR could actually not solve the problem.

I think a better solution would be for PaperImplementations#modifyCommandTreesSafely to block the thread until the COMMAND_SENDING_POOL has finished all its tasks. However, I didn't see any way to do this without controlling how tasks are submitted to the pool.

Anyway, I am definitely not familiar with multithreaded code, so I have no idea if what I've done here is a 'good' solution. I'm not sure if I'm submitting tasks correctly or handling stuff like InterruptedException safely. Feedback is greatly appreciated.


So far, I've tested this manually on Paper and Spigot 1.20.2. It seems to resolve all the issues mentioned in #494. I plan to test this on other versions as well.

TODO:
[ ] Test on real servers (Paper+Spigot)

I tested it, and it didn't work. See #501 (comment).

@JorelAli
Copy link
Owner

(For what it's worth, my multi-threaded programming knowledge is very limited, so I have no comments to make about the implementation from this PR)

@willkroboth
Copy link
Collaborator Author

Ah, I think I figured out why the pool only uses one thread. From the javadocs:

Queuing
Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:

  • If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
  • If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
  • If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.

There are three general strategies for queuing:

  1. Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.
  2. Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.
  3. Bounded queues. A bounded queue (for example, an ArrayBlockingQueue) helps prevent resource exhaustion when used with finite maximumPoolSizes, but can be more difficult to tune and control. Queue sizes and maximum pool sizes may be traded off for each other: Using large queues and small pools minimizes CPU usage, OS resources, and context-switching overhead, but can lead to artificially low throughput. If tasks frequently block (for example if they are I/O bound), a system may be able to schedule time for more threads than you otherwise allow. Use of small queues generally requires larger pool sizes, which keeps CPUs busier but may encounter unacceptable scheduling overhead, which also decreases throughput.

So, looking at Paper's ThreadPoolExecutor:

public static final java.util.concurrent.ThreadPoolExecutor COMMAND_SENDING_POOL = new java.util.concurrent.ThreadPoolExecutor(
    0, 2, 60L, java.util.concurrent.TimeUnit.SECONDS,
    new java.util.concurrent.LinkedBlockingQueue<>(),
    new com.google.common.util.concurrent.ThreadFactoryBuilder()
        .setNameFormat("Paper Async Command Builder Thread Pool - %1$d")
        .setUncaughtExceptionHandler(new net.minecraft.DefaultUncaughtExceptionHandlerWithName(net.minecraft.server.MinecraftServer.LOGGER))
        .build(),
    new java.util.concurrent.ThreadPoolExecutor.DiscardPolicy()
);

They're using an unbounded LinkedBlockingQueue, so that falls into number 2. As the javadocs explain, a thread pool will prefer to queue a task rather than create a new thread once corePoolSize threads have been created. Since the queue is unbounded, it can always queue a task and will never create more than corePoolSize threads, making maximumPoolSize irrelevant. Since corePoolSize is 0 here, once it makes its first thread, it has more threads than corePoolSize, so it won't make another thread. Hence, this thread pool will only use one thread.


So, that means this PR should work. Submitting the CommandAPI's register and unregister tasks to the thread pool will ensure that no CMEs happen. The pool only uses one thread at a time, so its either reading from the dispatchers to make a Commands packet or modifying the dispatchers as the CommandAPI requests.

However, this might be relying on unintended behavior. Given that maximumPoolSize is set to 2, Paper might have expected 2 threads to be handling Commands packet building. So, Paper might change this behavior in the future, breaking the changes here. That's just speculation though, it's hard to tell what Paper was thinking. PaperMC/Paper#8170 created the COMMAND_SENDING_POOL and PaperMC/Paper#8272 tweaked the configuration. Neither seem to give any reasoning why maximumPoolSize is set to 2. Intrestingingly, in some discord messages discussing the changes, maximumPoolSize is set to 1. So, maybe the value of 2 is unintended ¯\(ツ)/¯.


Anyway, the implementation for PaperImplementations#modifyCommandTreesSafely seems like it should work. I'm going to read through the javadocs for ThreadPoolExecutor some more to see if I can learn how I should be submitting tasks and handling threading exceptions.

@willkroboth
Copy link
Collaborator Author

Hm, unfortunately, this fix does not solve the problem on all versions of Paper. The COMMAND_SENDING_POOL was only added by PaperMC/Paper#8170 (build paper-1.19.2-117), so this PR only fixes the problem for builds after that PR. Earlier Paper builds can still have the CME (for example, see this log on 1.19.2-116).

Specifically, the original PR (PaperMC/Paper#3116, build paper-1.15.2-177) submits its task to the common ForkJoinPool (¯\(ツ)/¯ whatever that is)

java.util.concurrent.ForkJoinPool.commonPool().execute(() -> {
    sendAsync(entityplayer);
});

A random commit later on (PaperMC/Paper@7323594, build paper-1.18-31) changed this to use some sort of MCUtil task executor:

net.minecraft.server.MCUtil.scheduleAsyncTask(() -> this.sendAsync(player));

For reference, the current system (PaperMC/Paper#8170, build paper-1.19.2-117) changed it to this:

COMMAND_SENDING_POOL.execute(() -> {
        this.sendAsync(player);
});

So, there needs to be more work to fix this on older Paper versions.

@willkroboth willkroboth changed the title Run command tree modify tasks with Commands.COMMAND_SENDING_POOL on Paper Ensure CME does not happen when (un)registering commands on Paper Oct 30, 2023
@willkroboth willkroboth changed the title Ensure CME does not happen when (un)registering commands on Paper Ensure CME does not happen when (un)registering commands on Paper during server runtime Oct 30, 2023
Use some sneaky reflection to intercept `net.minecraft.server.CommandDispatcher`'s task scheduling and enforce our own read-write access

Updates #501
Fixes #494 and #503
@willkroboth
Copy link
Collaborator Author

willkroboth commented Dec 19, 2023

Alright, I basically rewrote this whole thing with the new research in mind.

Now, the CommandAPI intercepts net.minecraft.server.CommandDispatcher's attempts to schedule its read tasks and uses a ReentrantReadWriteLock to control when the tasks are allowed to proceed. When the CommandAPI tries to write to the Brigadier CommandDispatchers, it acquires the lock, blocking new read tasks from continuing and blocking the write task until all the read tasks that are currently running have stopped.

I haven't tested this yet, but I think it should work. It also shouldn't interfere with anything on Spigot.

TODO:

Test on Spigot
  • 1.16.5
  • 1.17
  • 1.17.1
  • 1.18
  • 1.18.1
  • 1.18.2
  • 1.19
  • 1.19.1
  • 1.19.2
  • 1.19.3
  • 1.19.4
  • 1.20, 1.20.1
  • 1.20.2
  • 1.20.3, 1.20.4
Test on Paper
  • 1.15, 1.15.1
  • 1.15.2
    • paper-1.15.2-176
    • paper-1.15.2-177
  • 1.16.1
  • 1.16.2, 1.16.3
  • 1.16.4
  • 1.16.5
  • 1.17
  • 1.17.1
  • 1.18
    • paper-1.18-30
    • paper-1.18-31
  • 1.18.1
  • 1.18.2
  • 1.19
  • 1.19.1
  • 1.19.2
    • paper-1.19.2-116
    • paper-1.19.2-117
  • 1.19.3
  • 1.19.4
  • 1.20, 1.20.1
  • 1.20.2
  • 1.20.3, 1.20.4
  • Remove debug messages

@willkroboth
Copy link
Collaborator Author

Hmm, the current solution doesn't quite work. For versions ~1.15 to 1.18 net.minecraft.server.CommandDispatcher submits tasks using ForkJoinPool#commonPool, which references the field ForkJoinPool.common. To intercept the submitted tasks, I tried using reflection to replace that field with a CommandAPI-controlled object. Unfortunately, reflection is not allowed to access this field:

java.lang.reflect.InaccessibleObjectException: 
  Unable to make field static final java.util.concurrent.ForkJoinPool 
    java.util.concurrent.ForkJoinPool.common accessible: 
  module java.base does not "opens java.util.concurrent" to unnamed module @14aca5ab

I'll need to find something else for this again

@willkroboth willkroboth marked this pull request as draft December 29, 2023 03:19
@willkroboth
Copy link
Collaborator Author

On pause until #517. That PR splits up the Spigot and Paper modules, which will be helpful since this PR won't have to do separate Paper detection that works across all versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants