threading question #85

rsettlage · 2021-03-05T18:06:14Z

Hi, thanks for this pipeline, loving it. BUT am not yet a snakemake guru.

I have a question regarding optimizing of the compute. Seems like I can run the pipeline two ways:

start each sample independently
start the batch

If I do (1), I am starting vpipe with the --cores 128 option (AMD server with 128 physical cores) but it seems to use only 4 threads for those sub-programs that can use them. In the vpipe config files, I see the threads option, but that seems to be set to 1. So, where did it get the 4 and is there an easy way to change that globally? --threads=128 or something?

If I do (2), is there a way to specify the number of samples that should be processed simultaneously AND similar to above, the threads to use for each process? Something like process 8 samples at a time using 16 threads each.

Thanks
Bob

The text was updated successfully, but these errors were encountered:

rsettlage · 2021-03-05T19:12:54Z

FYI, I do the section in the docs that says the default is 4, I am more curious where that is set when the values I see in the config file suggest 1.

DrYak · 2021-03-06T16:19:46Z

First regarding 1 vs 2:

Snakemake doesn't really have an internal notion of samples it only considers jobs. It builds a DAG of all jobs that needs to be run, and then runs them as soon as all of their dependencies are met (e.g.: SNV calling needs first an alignment and won't start before) and as soon as enough resources are free (e.g.: enough threads are available).

Currently, it parses the DAG breadth-first (so it will tend to run most of the samples in parallel - i.e.: the alignment jobs will tend to be all called before the SNV calling jobs).

So if you want each sample to be processed separately, you would need to run a whole snakemake separately for each.

DrYak · 2021-03-06T16:27:22Z

Now for your questions:

regarding threads:

each rule takes its number of threads (snakemake's threads: directive) from the "threads=" parameter in the config file.
the defaults are currently in the file rules/config_default.smk
by default (threads=0 for a specific rule) it falls back to the global threads= setting in the [general] section (see here) which 4 by default.

DrYak · 2021-03-06T16:40:21Z

now regarding fine tuning you configuration file:

calling SNVs is done by default using ShoRAH which work in independant local windows. Thus it is an embarrassingly-parallel type of problems and can scale to more threads (currently we run 64 concurrent threads on our thread rippers), requesting 1GiB of RAM per thread on average which works most of the time.

[snv]
consensus=false
time=240
threads=64
mem=1024
localscratch=$TMPDIR

bwa (the default aligner for SARS-CoV-2) works in batches of ~1 million reads.
according to litterature, it is able to work with up to 8~16 threads until other contention diminishes any further parallelization. In our case, a very large proportion (3/4) of the samples are processed in 6 batches, so it's not worth requesting further threads:

[bwa_align]
mem=2048
threads=6

DrYak · 2021-03-06T16:47:46Z

for running specifically 8 samples in parallel and allocating exactly 16 threads on each:

It's not easily done in the current way vpipe is written.
Maybe it's possible to experiment with the --batch parameter of snakemake but I lack experience on this side.

Another approach would be to split your sample file in batches of 8 samples and run them separately. But in that case, you better use the consensus=false option mentioned above so SNV are called against the reference (e.g.: for SARS-CoV-2 that would be NC_045512) and not against each batch consensus (this would make it very difficult to compare between batches).

DrYak · 2021-03-06T16:49:29Z

last, a different approach if you run on HPC (and not on a single 128 core workstation) would be to let snakemake dispatch jobs on the cluster using its --cluster option.

rsettlage · 2021-03-06T23:04:25Z

Thanks, awesome information. I am indeed on an HPC system. I think I have settled on running each sample independently and am hoping the cluster option scales the various steps (jobs) according to threads used. For samples that seem to be more diverse, the last step of making the json file is painfully slow. I noticed just prior to that, there are 10 partitioned vcf files, would it be computationally (time) more efficient to process those individually and then combine sub-jsons?

DrYak assigned DrYak and kpj Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threading question #85

threading question #85

rsettlage commented Mar 5, 2021

rsettlage commented Mar 5, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

rsettlage commented Mar 6, 2021

threading question #85

threading question #85

Comments

rsettlage commented Mar 5, 2021

rsettlage commented Mar 5, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

DrYak commented Mar 6, 2021

rsettlage commented Mar 6, 2021