Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing parameters didn't make me perform the task any faster #127

Open
fz-gaojian opened this issue Nov 7, 2021 · 1 comment
Open

Comments

@fz-gaojian
Copy link

fz-gaojian commented Nov 7, 2021

Description

Without changing the parameters:

import os
import time

from aiomultiprocess import Pool


async def run_cmd(cmd):
    await asyncio.subprocess.create_subprocess_shell(
        cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE
    )


async def main():
    urls = ["sleep 0.000001" for _ in range(1000)]
    # async with Pool(queuecount=os.cpu_count(), childconcurrency=32) as pool:
    async with Pool() as pool:
        async for result in pool.map(run_cmd, urls):
            ...


if __name__ == '__main__':
    print(f"cpu_count: {os.cpu_count()}")
    a = time.time()
    asyncio.run(main())
    print(time.time() - a)

The output is:
cpu_count: 8
5.748602628707886

Changing parameters:

import os
import time

from aiomultiprocess import Pool


async def run_cmd(cmd):
    await asyncio.subprocess.create_subprocess_shell(
        cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE
    )


async def main():
    urls = ["sleep 0.000001" for _ in range(1000)]
    async with Pool(queuecount=os.cpu_count(), childconcurrency=32) as pool:
    # async with Pool() as pool:
        async for result in pool.map(run_cmd, urls):
            ...


if __name__ == '__main__':
    print(f"cpu_count: {os.cpu_count()}")
    a = time.time()
    asyncio.run(main())
    print(time.time() - a)

The output is:
cpu_count: 8
6.246175050735474

I have carefully read the user guide for these parameters, but I am still unable to complete the task any faster. How to configure parameters to complete the task in the fastest speed? Thank you very much!

Details

  • OS: Deepin 20.2.4
  • Python version: Python 3.7.3
  • aiomultiprocess version: aiomultiprocess 0.9.0
  • Can you repro on master? yes
  • Can you repro in a clean virtualenv? yes
@trifle
Copy link

trifle commented Apr 14, 2022

@fz-gaojian You're running two pools with the same number of OS processes (os.cpu_count should be the default, it at least is the default of multiprocessing.pool). You're then running coroutines which create OS processes to run the sleep command.

My intuition would be that you're probably hitting a bottleneck in your OS. Spawning OS processes (subprocesses) is not cheap, especially since you're also starting a full shell.

If you want to explore this further, perhaps try with a longer sleep time and/or explore what happens when you create semaphores in key choke points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants