Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

billiard.Pool slower than multiprocessing.Pool #336

Open
Abhinav-Aidash opened this issue Aug 23, 2021 · 3 comments
Open

billiard.Pool slower than multiprocessing.Pool #336

Abhinav-Aidash opened this issue Aug 23, 2021 · 3 comments

Comments

@Abhinav-Aidash
Copy link

import pandas as pd
import time
import numpy as np


def aa(df):
    df_c = df.copy()
    df_c['C'] = df_c['A'] + df_c['B']
    return df_c


def using_multiprocessing(df, func, n_cores=4):
    from multiprocessing import Pool
    df_split = np.array_split(df, n_cores)

    pool = Pool(n_cores)
    start_time = time.time()
    df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    print(f"Using multiprocessing {time.time() - start_time}")
    return df


def using_billiard(df, func, n_cores=4):
    import billiard
    df_split = np.array_split(df, n_cores)

    pool = billiard.Pool(n_cores)
    start_time = time.time()
    df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    print(f"Using billiard {time.time() - start_time}")
    return df


inp = pd.DataFrame(np.random.randint(0,1000000,size=(1000000, 2)), columns=list('AB'))

df_1 = using_multiprocessing(inp, aa, 5)
df_2 = using_billiard(inp, aa, 5)

print(df_1.equals(df_2))

I got output:

Using multiprocessing 0.2115309238433838
Using billiard 31.323524951934814
True
@pint-drinker
Copy link

I'm seeing a very similar behavior in a django application within a celery task. When adding some time.time(), my functions running within the process pools finish up in about 2 seconds, but the pool.join() does not finish for another 30 seconds. When using multiprocessing, it joins up almost instantly.
I am using billiard because multiprocessing does not work within celery tasks.

@pint-drinker
Copy link

I saw a similar issue in this issue:
#340
And implementing things this way fixed my issue.

@auvipy
Copy link
Member

auvipy commented Aug 2, 2023

Also I think we should sync the billiard code with cpython main branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants