Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory not released after parallel job #1577

Open
azylbertal opened this issue May 1, 2024 · 0 comments
Open

Memory not released after parallel job #1577

azylbertal opened this issue May 1, 2024 · 0 comments

Comments

@azylbertal
Copy link

Seeing something very similar to this old issue, even though using recent versions with the loky backend.
Running the following on Ubuntu 22.04, with joblib 1.4.0, python 3.9, numpy 1.26.4, psutil 5.9.0:

import numpy as np
import psutil, gc
from joblib import Parallel, delayed

def mem():
    return psutil.Process().memory_info().rss / (2**20)

def proc():
    return np.ones(50000)

print(f'Initial memory usage: {mem()}')
for round in range(5):
    res = Parallel(n_jobs=6, verbose=1)(delayed(proc)() for _ in range(10000))
    print(f'After round {round}: {mem()}')
    del res
    print(f'After deleting res: {mem()}')

gc.collect()
print(f'After gc.collect: {mem()}')

Output:

Initial memory usage: 60.234375
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 100 tasks      | elapsed:    0.3s
[Parallel(n_jobs=6)]: Done 4468 tasks      | elapsed:    3.7s
[Parallel(n_jobs=6)]: Done 10000 out of 10000 | elapsed:    7.8s finished
After round 0: 3952.2890625
After deleting res: 3300.3046875
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 100 tasks      | elapsed:    0.1s
[Parallel(n_jobs=6)]: Done 8180 tasks      | elapsed:    4.3s
[Parallel(n_jobs=6)]: Done 10000 out of 10000 | elapsed:    5.5s finished
After round 1: 3966.59375
After deleting res: 3843.8359375
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 104 tasks      | elapsed:    0.1s
[Parallel(n_jobs=6)]: Done 8180 tasks      | elapsed:    4.4s
[Parallel(n_jobs=6)]: Done 10000 out of 10000 | elapsed:    5.6s finished
After round 2: 3951.92578125
After deleting res: 3693.2734375
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 100 tasks      | elapsed:    0.1s
[Parallel(n_jobs=6)]: Done 8180 tasks      | elapsed:    4.3s
[Parallel(n_jobs=6)]: Done 10000 out of 10000 | elapsed:    5.4s finished
After round 3: 3952.421875
After deleting res: 3671.5078125
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 100 tasks      | elapsed:    0.1s
[Parallel(n_jobs=6)]: Done 8180 tasks      | elapsed:    4.4s
[Parallel(n_jobs=6)]: Done 10000 out of 10000 | elapsed:    5.5s finished
After round 4: 3952.58203125
After deleting res: 3694.0078125
After gc.collect: 3694.0078125

With joblib 1.2.0 it is worse (the memory that remains locked is accumulating over repeated runs). I was also able to reproduce this on two linux-based clusters with similar python env as my Ubuntu desktop (RHEL Server release 7.8 and CentOS Linux 7). It only happens when the task returns large arrays (not if it just defines them localy).
On Windows 11 it doesn't seem to happen at all with joblib 1.4.0, but it does with 1.2.0 (although solved by explicit calls to gc.collect, which don't help on linux).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant