Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The joblib opened too many files while running. #1581

Open
esse-byte opened this issue May 6, 2024 · 0 comments
Open

The joblib opened too many files while running. #1581

esse-byte opened this issue May 6, 2024 · 0 comments

Comments

@esse-byte
Copy link

esse-byte commented May 6, 2024

I have also asked a question on Stockoverflow, with slight differences.

A simplified case:

## lsof.py, python 3.11, joblib 1.4(also test in 1.4.2)
from joblib import Parallel, delayed
import time
import sys
import pandas as pd


class Tasker:
    def __init__(self):
        self.data = pd.Series([])

    def run(self):
        time.sleep(10)
        return 1.0

def get_num_of_opened_files() -> tuple[int, int]:
    from subprocess import run
    return int(run('lsof | wc -l', shell=True, capture_output=True, text=True).stdout.strip()), \
           int(run('lsof | grep \\.so$ | wc -l', shell=True, capture_output=True, text=True).stdout.strip())


tasker = Tasker()
f0, s0 = get_num_of_opened_files()
xs = Parallel(n_jobs=32, return_as='generator')(delayed(tasker.run)() for _ in range(32))
time.sleep(2)
f1, s1 = get_num_of_opened_files()

print(f'Opened files: before {f0}, after {f1}, delta all {f1 - f0}, delta so: {s1 - s0}', flush=True)
print(sum(xs))

Run above py script will got something like:

>> python lsof.py
>> Opened files: before 13924, after 77428, delta all 63504, delta so: 40012

The joblib opened about 60,000 files!!!
And If I running 10 programs like this, the joblib will claim that:
UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.

**Or even raise an error(my server with 2T free memory): **
A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant