Add parallel support AddModuleScore #6369
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Seurat Team,
Just a PR based on discussion in previous PR request #6348 to add support for
AddModuleScore
parallel processing. My solution uses future/future.apply packages so no additional dependencies.Quick single test (can run more realistic benchmark with bench package but don't feel it's really necessary) adding 100 scores of 100 genes each to object with ~47,000 nuclei and ~28,000 features sequential vs parallel with 4 cores was 1.7 times faster.
One thing I did debate and it's up to you is whether to add additional function parameter specifying parallel processing and make the internal function check something like this:
The reason being that the gains with parallel processing with future for this function are most useful with large numbers of gene lists. However, if just adding single gene list or couple it's probably slightly faster to run normally. I left out in PR to keep everything the same but if this is something you think would be helpful I can easily add.
Thanks!
Sam
p.s. tagging author of original PR here so he can follow this @scottgigante