Add parallel support AddModuleScore #6369

samuel-marsh · 2022-08-31T14:52:14Z

Hi Seurat Team,

Just a PR based on discussion in previous PR request #6348 to add support for AddModuleScore parallel processing. My solution uses future/future.apply packages so no additional dependencies.

Quick single test (can run more realistic benchmark with bench package but don't feel it's really necessary) adding 100 scores of 100 genes each to object with ~47,000 nuclei and ~28,000 features sequential vs parallel with 4 cores was 1.7 times faster.

library(tidyverse)
library(Seurat)
library(scCustomize)
library(qs)
library(tictoc)
library(future)
library(future.apply)

test <- qread("marsh.qs")

# Extract Gene Lists from All Objects
all_genes_marsh <- rownames(test@assays$RNA)

# Create 100 random gene lists of 100 genes
random_gene_sets_micro <- lapply(vector("list", 100), function(x){sample(all_genes_marsh, length(1:100))})

tic()
test <- AddModuleScore(object = test, features = random_gene_sets_micro)
toc()
429.236 sec elapsed

# restart R

library(tidyverse)
library(Seurat)
library(scCustomize)
library(qs)
library(tictoc)
library(future)
library(future.apply)

plan("multisession", workers = 4)
options(future.globals.maxSize = 3000 * 1024^2)

test <- qread("marsh.qs")

# Extract Gene Lists from All Objects
all_genes_marsh <- rownames(test@assays$RNA)

# Create 100 random gene lists of 100 genes
random_gene_sets_micro <- lapply(vector("list", 100), function(x){sample(all_genes_marsh, length(1:100))})

tic()
test <- AddModuleScore(object = test, features = random_gene_sets_micro)
toc()
251.93 sec elapsed

One thing I did debate and it's up to you is whether to add additional function parameter specifying parallel processing and make the internal function check something like this:

 if (nbrOfWorkers() > 1 && is.TRUE(parallel)

The reason being that the gains with parallel processing with future for this function are most useful with large numbers of gene lists. However, if just adding single gene list or couple it's probably slightly faster to run normally. I left out in PR to keep everything the same but if this is something you think would be helpful I can easily add.

Thanks!
Sam

p.s. tagging author of original PR here so he can follow this @scottgigante

scottgigante · 2022-08-31T15:11:02Z

Thanks @samuel-marsh for the quick work! I don't think parallel=TRUE is necessary because a user could always use plan(sequential).

samuel-marsh · 2022-08-31T16:00:15Z

agreed though some people set and forget at top of script. Overall I lean towards not adding extra param too.

Add parallel support AddModuleScore

ac56662

dcollins15 force-pushed the develop branch from a87fd5f to 41d19a8 Compare November 28, 2023 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel support AddModuleScore #6369

Add parallel support AddModuleScore #6369

samuel-marsh commented Aug 31, 2022 •

edited

scottgigante commented Aug 31, 2022

samuel-marsh commented Aug 31, 2022

Add parallel support AddModuleScore #6369

Are you sure you want to change the base?

Add parallel support AddModuleScore #6369

Conversation

samuel-marsh commented Aug 31, 2022 • edited

scottgigante commented Aug 31, 2022

samuel-marsh commented Aug 31, 2022

samuel-marsh commented Aug 31, 2022 •

edited