Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run concurrent asyncio processes #396

Open
ifeldshteyn-emc opened this issue Apr 25, 2023 · 5 comments
Open

Unable to run concurrent asyncio processes #396

ifeldshteyn-emc opened this issue Apr 25, 2023 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ifeldshteyn-emc
Copy link

ifeldshteyn-emc commented Apr 25, 2023

Hi,

The bot is unable to handle asyncio concurrent messagefunction calls. If you add sleep to any listen function - the bot will be unable to handle any other messages until the sleep is released.

Here is an example. If you type !test to the bot it will enter the function test and sleep for 5 seconds. If during those 5 seconds you tell the bot !hi it won't reply to you until the !test function is released.

Expected functionality would be ability to handle multiple requests concurrently

    @listen_to("^!test")
    async def test(self, message: Message):
        self.driver.reply_to(message, "Start Sleep")
        time.sleep(5)
        self.driver.reply_to(message, "End Sleep")

    @listen_to("^!hi")
    async def hi(self, message: Message):
        self.driver.reply_to(message, "Hello")
@unode
Copy link
Collaborator

unode commented May 2, 2023

This needs further looking into but we have two execution paths.

Regular (non-async) functions are handled by a thread pool, currently hardcoded to handle up to 10 concurrent requests.
Requests run outside the MainThread and don't block its execution.

async functions are currently handled by the MainThread. They are expected to await as soon as possible to release the work to other tasks.
This means that in the above code, where time.sleep(5) is used, await asyncio.sleep(5) should have been used instead.

However this is rather unintuitive as it blocks new requests.
The source of the issue is here.

This needs further reflection as I'm not sure threading the work is safe. @jneeven 's advice would be welcome here as this looks like a deliberate design choice.

@attzonko attzonko added the bug Something isn't working label May 8, 2023
@attzonko attzonko added the help wanted Extra attention is needed label Oct 27, 2023
@fmarotta
Copy link

fmarotta commented Nov 1, 2023

I share the concern that the current implementation is potentially unsafe. Both async and regular functions could introduce race conditions. The matter becomes more complex due to the two scenarios in which race conditions can arise. For each new message in Mattermost, the bot iterates through all registered listeners and, if the message's content matches a registered pattern, it invokes the corresponding plugin's method. One way to have a race condition is if two or more listeners, possibly from different plugins, react to the same trigger text: those functions will be running concurrently. The other way to have a race condition is when multiple messages triggering the same listener arrive in a short time frame, so that the same function is invoked on different messages but running concurrently. Race conditions could naturally occur among non-async plugin methods, which run in a thread pool by default as pointed out by @unode, but even async methods, which run as coroutines in the MainThread, are not immune to the problem, as anything that happens after an await is unpredictable.*

One might suggest that, when there is a race condition risk, plugin developers should avoid using async functions and employ thread locks within their methods. However, I think this approach would not solve the problem. If a message triggers both f1() and f2(), they are scheduled almost simultaneously in the thread pool, but even if both functions attempt to acquire a lock, there is no guarantee that f1() will manage to acquire it before f2() all the time, resulting in unpredictable execution order. Additionally, if message1 and message2 arrive in the chat almost simultaneously, f(message2), although scheduled slightly after f(message1), could conceivably manage to acquire the lock before f(message1). So, if I am not mistaken, nothing that the user can do would fix the race conditions (although locks would at least greatly reduce their probability).

The only solution that I could come up with would be to run async methods in a thread pool, and non-async methods sequentially in the MainThread, in a blocking manner. async functions are not supposed to be safe, so plugin developers should only use them when they are sure that there are no side effects or that the order of execution does not matter. In this case, those methods may as well run in different threads (or even in parallel), so as not to block the MainThread. One could maybe use asyncio.loop.run_in_executor() to achieve this result. With this approach, the code example posted above by @ifeldshteyn-emc would work as the OP expected, because the time.sleep() call would block an external thread, not the main one. A range of related potential problems would also be solved; in the current implementation, if an async plugin method does not use await, that method is effectively blocking the MainThread, exactly as in the OP's example. On the other hand, regular non-async functions would be executed in the MainThread and delay everything else that is not async, avoiding race conditions. The regular functions would always run in the same, predictable order, and f(message1) would return before f(message2) is called. If necessary, plugin developers could still use their own thread pool for launching long-running tasks from their non-async plugin methods.

In practical terms, my proposal would be to modify this logic so that all async plugin functions are scheduled in a thread pool first, followed by the sequential execution of non-async functions in the MainThread. I am not super familiar with asyncio.loop.run_in_executor(), but I could look into it. If this could be implemented, existing threaded code would suddenly become blocking, so it may be called a breaking change... However, if it proves effective, it would be extremely useful for plugins where ensuring that messages are processed in the correct order is important, like our own trivia quiz plugin.

* For example, consider a plugin that listens to a download <file> trigger, and runs an async method that awaits fetching the file and sends it to the chat as an attachment. Someone could type in Mattermost download A and then download B, in this order, but file B is much smaller than file A, so the method for file B returns earlier. In other words, after an await, there is no way to preserve the original order of the messages. This may not matter for a plugin which just downloads files, but in our use-case of a plugin that runs a competitive game in the Mattermost chat, knowing which message came first is crucial.

@unode
Copy link
Collaborator

unode commented Nov 1, 2023

I second @fmarotta's interpretation, however I propose a slight extension to what is mentioned in his post.
To begin, I would move both async and sync tasks out of MainThread.

Two motivations:

  • If a plugin implements blocking code, the main thread will block and no other plugin will fire until that task completes. This will make the bot feel laggy and/or unresponsive and create competition between different plugins.
  • Two function calls in the same plugin (same function f1(); f1() or different functions f1(); f2() in same plugin) may want to wait for each other or respect trigger order (as in the trivia plugin case mentioned).

To address both, I suggest that:

  1. we create a separate thread/process pool (default size = 1) for every plugin
  2. we add an argument to @listen that exposes an execution context (possibly including size). Two functions or plugins with the same execution context share the same thread/process pool.

The above would ensure two calls to the same function in the same plugin are queued together. This should hopefully make it more predictable for the developer of the plugin, while at the same time avoiding interference from other plugins.

As for async vs sync, there's still some misunderstanding in the original question. async does not mean that blocking code will run in parallel. You still need to use workarounds.
The part of the MainThread not handling further replies (and reason for the bug label) should hopefully be addressed by moving things out of the MainThread.

This would then mean that:

  1. Both sync and async code would get a plugin specific thread/process pool.
  2. Code would run "in parallel" within said pool if size > 1 otherwise, blocking/non-blocking execution applies.

This however glances over thread safety, which we will need to assess separately. It may very well be that Python's GIL keep us safe but we'll need to see how it scales and what context needs to be shared/passed.

@attzonko
Copy link
Owner

attzonko commented Nov 2, 2023

I agree with the direction this discussion is headed. Making either of the changes described should be considered "breaking" changes and would warrant stepping the major version of mmpy_bot. I really like the idea of preventing individual plugins from blocking the MainThread

@fmarotta
Copy link

fmarotta commented Nov 2, 2023

Thanks @unode and @attzonko. I also like the idea of having a plugin-specific pool combined with an execution context, it should indeed keep the MainThread free and help prevent race conditions. I think it's fair to assume that most conflicts will arise from within the same plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants