Skip to content

Commit

Permalink
fix: Bwa mem threads (#1743)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

Currently,
[master/bio/bwa-mem2/mem](https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa-mem2/mem.html)
uses more threads than the number requested by user: for `x` threads
requested by user, `x` is used in bwa-mem. Then either `x` are used by
samtools (view or sort), or `1` by Picard. So for `x` threads requested
by user, from `x+1` to `2x` threads are used.

This leads to bwa-mem jobs using official wrappers being killed by
cluster manager for not respecting fair use rules where I work.
Coworkers asked for a fix.

In this PR, I introduce a function to split the total number of threads
between bwa-mem in one hand, and samtools/picard in the other hand.

Ultimately, the introduced changes also allow an uncompressed SAM
output.

Tests have been modified, since this wrapper now requires at least two
threads when samtools/picard are used. No more when sam output is
requested.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: tdayris <tdayris@gustaveroussy.fr>
Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: Felix Mölder <felix.moelder@uni-due.de>
Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
  • Loading branch information
8 people committed Aug 25, 2023
1 parent 2f76671 commit e35e312
Show file tree
Hide file tree
Showing 4 changed files with 69 additions and 16 deletions.
5 changes: 3 additions & 2 deletions bio/bwa-mem2/mem/meta.yaml
Expand Up @@ -5,9 +5,10 @@ authors:
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
- Thibault Dayris
input:
- FASTQ file(s)
- reference genome
- reads: List of path(s) to FASTQ file(s)
- idx: List of paths to indexed reference genome files
output:
- SAM/BAM/CRAM file
notes: |
Expand Down
23 changes: 21 additions & 2 deletions bio/bwa-mem2/mem/test/Snakefile
Expand Up @@ -9,9 +9,28 @@ rule bwa_mem2_mem:
"logs/bwa_mem2/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort="none", # Can be 'none', 'samtools', or 'picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools/picard.
sort_extra="", # Extra args for samtools/picard sorts.
threads: 8
wrapper:
"master/bio/bwa-mem2/mem"


rule bwa_mem2_mem_sam:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
output:
"mapped/{sample}.sam",
log:
"logs/bwa_mem2/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools', or 'picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools/picard sorts.
threads: 8
wrapper:
"master/bio/bwa-mem2/mem"
47 changes: 38 additions & 9 deletions bio/bwa-mem2/mem/wrapper.py
Expand Up @@ -19,9 +19,13 @@
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
samtools_opts = get_samtools_opts(snakemake, param_name="sort_extra")
samtools_opts = get_samtools_opts(
snakemake, parse_threads=False, param_name="sort_extra"
)
java_opts = get_java_opts(snakemake)

bwa_threads = snakemake.threads
samtools_threads = snakemake.threads - 1

index = snakemake.input.get("index", "")
if isinstance(index, str):
Expand All @@ -42,31 +46,56 @@

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts}"
# Correctly assign number of threads according to user request
if samtools_threads >= 1:
samtools_opts += f" --threads {samtools_threads} "

if str(snakemake.output[0]).lower().endswith(("bam", "cram")):
# Simply convert to bam using samtools view.
pipe_cmd = " | samtools view {samtools_opts} > {snakemake.output[0]}"
else:
# Do not perform any sort nor compression, output raw sam
pipe_cmd = " > {snakemake.output[0]} "


elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -T {tmpdir}"
# Correctly assign number of threads according to user request
if samtools_threads >= 1:
samtools_opts += f" --threads {samtools_threads} "

# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"

# Sort alignments using samtools sort.
pipe_cmd = " | samtools sort {samtools_opts} {sort_extra} -T {tmpdir} > {snakemake.output[0]}"

elif sort == "picard":
# Correctly assign number of threads according to user request
bwa_threads = bwa_threads - 1
if bwa_threads <= 0:
raise ValueError(
"Not enough threads requested. This wrapper requires exactly one more."
)

# Sort alignments using picard SortSam.
pipe_cmd = "picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin --TMP_DIR {tmpdir} --SORT_ORDER {sort_order} --OUTPUT {snakemake.output[0]}"
pipe_cmd = (
" | picard SortSam {java_opts} {sort_extra} "
"--INPUT /dev/stdin "
"--TMP_DIR {tmpdir} "
"--SORT_ORDER {sort_order} "
"--OUTPUT {snakemake.output[0]}"
)

else:
raise ValueError(f"Unexpected value for params.sort ({sort})")


with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" -t {bwa_threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
" " + pipe_cmd + " ) {log}"
)
10 changes: 7 additions & 3 deletions test.py
Expand Up @@ -2160,7 +2160,11 @@ def test_bwa_samse_sort_picard():
def test_bwa_mem2_mem():
run(
"bio/bwa-mem2/mem",
["snakemake", "--cores", "1", "mapped/a.bam", "--use-conda", "-F"],
["snakemake", "--cores", "2", "mapped/a.bam", "--use-conda", "-F"],
)
run(
"bio/bwa-mem2/mem",
["snakemake", "--cores", "2", "mapped/a.sam", "--use-conda", "-F"],
)


Expand All @@ -2171,7 +2175,7 @@ def test_bwa_mem2_sort_samtools():
[
"snakemake",
"--cores",
"1",
"2",
"mapped/a.bam",
"--use-conda",
"-F",
Expand Down Expand Up @@ -2203,7 +2207,7 @@ def test_bwa_mem2_sort_picard():
[
"snakemake",
"--cores",
"1",
"2",
"mapped/a.bam",
"--use-conda",
"-F",
Expand Down

0 comments on commit e35e312

Please sign in to comment.