Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenProcessPool on run.cpu_bound #2774

Open
gotev opened this issue Mar 27, 2024 · 3 comments
Open

BrokenProcessPool on run.cpu_bound #2774

gotev opened this issue Mar 27, 2024 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@gotev
Copy link
Contributor

gotev commented Mar 27, 2024

Description

I'm trying to execute multiple parallel tasks which performs CPU bound activities, so I'm using run.cpu_bound.

I expect a failed task to not cause the others to fail as well. What happens is that if a task launched inside a process causes the process to crash (not infrequent when you launch C/C++ apps) in case of error (or OOM issues), a BrokenProcessPool Exception gets raised for all the subsequent tasks. The pool becomes unusable for the rest of the nicegui app's execution, until next restart. That's just how python's standard ProcessPoolExecutor works. Excerpt from the docs: https://docs.python.org/3/library/concurrent.futures.html

initializer is an optional callable that is called at the start of each worker process; initargs is a tuple of arguments passed to the initializer. Should initializer raise an exception, all currently pending jobs will raise a BrokenProcessPool, as well as any attempt to submit more jobs to the pool.

from nicegui import ui, run
import os

def crash_process():
    # Cause the process to exit with a non-zero exit status
    os._exit(1)

def fine_process():
    print('Hey this should be printed')

async def on_long_operations():
    try:
        print('First task')
        await run.cpu_bound(crash_process)
    except Exception as e:
        print(f'First task error: {e}')

    try:
        print('Second task')
        await run.cpu_bound(fine_process)
    except Exception as e:
        print(f'Second task error: {e}')

    try:
        print('Third task')
        await run.cpu_bound(crash_process)
    except Exception as e:
        print(f'Third task error: {e}')

ui.button('Do the Job', on_click=on_long_operations)

ui.run()

which outputs:

NiceGUI ready to go on http://localhost:8080
First task
First task error: A process in the process pool was terminated abruptly while the future was running or pending.
Second task
Second task error: A child process terminated abruptly, the process pool is not usable anymore
Third task
Third task error: A child process terminated abruptly, the process pool is not usable anymore

By searching a bit, I've seen some projects employ a custom logic and other ones completely re-implement the pool.

  • One tactic is to intercept the broken process pool exception and then restart it. It's more of a workaround and there are some edge cases to handle, like tasks launched while restarting the pool and tasks already submitted to the pool and running or pending while one of the processes crashes
  • Another one is to use a library like deadpool.

Note: One can always employ a custom solution in a case like this, but I thought it will be useful to share this problem here with the community and decide what to do about scenarios like this. The framework is pretty solid and the process pools are well integrated with the app lifecycle, so IMHO finding a solution to this can only improve the quality of the framework and the apps produced with it.

@rodja rodja added the help wanted Extra attention is needed label Mar 28, 2024
@rodja
Copy link
Member

rodja commented Mar 28, 2024

Yes, a more robust solution would be awesome. But I'm not sure what the best way forward is...

@gotev
Copy link
Contributor Author

gotev commented Mar 28, 2024

@rodja it's a tough one, so I propose to start by reasoning about the use cases and think about a solution only after we are confident about the cases we want to cover.

Some thoughts and ideas to start with:

  • when a single running process crashes, it should not crash or stop the other ones which are either scheduled or running and should not prevent new ones from being scheduled
  • a way to cancel a single running process. So, when a cpu_bound process is scheduled, having its "handle".
  • a way to define a group of processes and be able to also cancel the whole group at once. One way could be introducing an additional parameter, like run.cpu_bound('groupId', function, args)

@rodja
Copy link
Member

rodja commented Apr 8, 2024

Sounds like a good approach. I think we should first get #2234 to work, so we have a basis for testing and verifying that everything works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants