Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent resource tracker after shutdown of ProcessPoolExecutor #118918

Open
rayosborn opened this issue May 10, 2024 · 2 comments
Open

Persistent resource tracker after shutdown of ProcessPoolExecutor #118918

rayosborn opened this issue May 10, 2024 · 2 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@rayosborn
Copy link

rayosborn commented May 10, 2024

Bug report

Bug description:

After launching jobs using a ProcessPoolExecutor instance and then shutting the instance down after they are complete, a subprocess launched by the executor to hold a semaphore lock is not shut down. This appears to be the reason why some jobs submitted to the batch queue of a distributed cluster are not terminated and have to be deleted manually.

I have confirmed the persistence of the process using the following code running on MacOS running Python 3.11 and Linux running Python 3.9.

import time

from concurrent.futures import ProcessPoolExecutor, as_completed

def test_function(i):
    time.sleep(20)
    return i

def test_pool():
    with ProcessPoolExecutor(max_workers=6) as executor:
        futures = []
        result = []
        for i in range(6):
            futures.append(executor.submit(test_function, i))
        for future in as_completed(futures):
            result.append(future.result())

    print(len(result))

if __name__ == '__main__':
    test_pool()

The complication is that, if you run this from the command line, the code will complete as expected. To see the problem, you have to embed the functions in an importable module and run the test_pool function in a debugger. I ran this in an interactive IPython shell within the NeXpy application.

Here are screenshots taken in VS Code, showing the processes before, during, and after running test_pool.

concurrent

The issue is that I believe that the additional process (9812 in the above example) should be shut down when the executor's shutdown function is called. I have tried to modify the standard shutdown function to join and close the executor._call_queue and tried to release the executor._mp_context._rlock, which I think is what launches the additional process, but none of these shut it down.

CPython versions tested on:

3.9, 3.11

Operating systems tested on:

Linux, macOS

@rayosborn rayosborn added the type-bug An unexpected behavior, bug, or error label May 10, 2024
@rayosborn
Copy link
Author

rayosborn commented May 11, 2024

I have just discovered that the additional process is not created when mp_context is forced to be 'fork' instead of 'spawn.' I thought that 'fork' was the default, but it appears that is not the case. The process is a ResourceTracker process, so if this is a bug, the issue is whether the shutdown of the ProcessPoolExecutor should include a call to stop this process when running with 'spawn'.

@rayosborn rayosborn changed the title Persistent lock process after shutdown of ProcessPoolExecutor Persistent resource tracker after shutdown of ProcessPoolExecutor May 11, 2024
@rayosborn
Copy link
Author

I posted a related question on StackOverflow, where I was informed that there is a private _stop function in the resource_tracker module that will shut down the tracker. I have confirmed that this worked as I had hoped. So my question now is whether there is a reason why this cannot be added to the ProcessPoolExecutor code, at least as an option.

For example, we could add another keyword argument to ProcessPoolExecutor.__init__, shutdown_tracker=False, creating a new private attribute, _shutdown_tracker, and then add the following to the end of the shutdown method.

        self._executor_manager_thread_wakeup = None
        if self._shutdown_tracker and self._mp_context.get_start_method(allow_none=False) != 'fork':
            import resource_tracker
            resource_tracker._resource_tracker._stop()

Are there good reasons not to make this an option? I am happy to submit a PR if there are no strong objections, because this is preventing us from using the 'spawn' method when submitting jobs to a distributed cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant