Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Parallel processing in subsample.py #197

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

SichongP
Copy link

This PR adds parallelization to subsampling as this script takes too long to run right now.

I tested new script with 10,000 total reads at 100 reads step size and 100 iterations:

With original script:

35.3 s ± 70.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel script (5 threads):

12.8 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The improvement should be more pronounced in real samples as multiprocessing overhead becomes negligible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants