Skip to content

Commit

Permalink
fix: fgbio annotatebamwithumis allow multiple umi fastq files (#1726)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

Currently, the wrapper tests that `input: umi:` is exactly one file
specified as string, but [`fgbio AnnotateBamWithUmis` allows for
multiple fastqs
specified](https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html),
which is needed when UMIs are present in multiple reads of the same
fragment. Thus, this PR relaxes the testing and tests the scenario with
another example rule specifying two fastqs.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
dlaehnemann committed Aug 7, 2023
1 parent 0e35b52 commit 101744d
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 16 deletions.
29 changes: 26 additions & 3 deletions bio/fgbio/annotatebamwithumis/test/Snakefile
@@ -1,16 +1,39 @@
rule AnnotateBam:
rule annotate_bam_single_fastq:
input:
bam="mapped/{sample}.bam",
umi="umi/{sample}.fastq",
output:
"mapped/{sample}.annotated.bam",
params: ""
params:
"",
resources:
# suggestion assuming unsorted input, so that memory should
# be proportional to input size:
# https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html
mem_mb=lambda wildcards, input: max([input.size_mb * 1.3, 200])
mem_mb=lambda wildcards, input: max([input.size_mb * 1.3, 200]),
log:
"logs/fgbio/annotate_bam/{sample}.log",
wrapper:
"master/bio/fgbio/annotatebamwithumis"


rule annotate_bam_multiple_fastqs:
input:
bam="mapped/{sample}.bam",
umi=[
"umi/{sample}.fastq",
"umi/{sample}.fastq",
],
output:
"mapped/{sample}-{sample}.annotated.bam",
params:
"",
resources:
# suggestion assuming unsorted input, so that memory should
# be proportional to input size:
# https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html
mem_mb=lambda wildcards, input: max([input.size_mb * 1.3, 200]),
log:
"logs/fgbio/annotate_bam/{sample}-{sample}.log",
wrapper:
"master/bio/fgbio/annotatebamwithumis"
9 changes: 7 additions & 2 deletions bio/fgbio/annotatebamwithumis/wrapper.py
Expand Up @@ -5,6 +5,7 @@


from snakemake.shell import shell
from snakemake.io import Namedlist
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
Expand All @@ -22,8 +23,12 @@

if umi_input is None:
raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")
elif not (isinstance(umi_input, str) or isinstance(umi_input, Namedlist)):
raise ValueError(
"Input UMIs-file should be a string or a snakemake io list: "
+ str(umi_input)
+ "!"
)

if not len(snakemake.output) == 1:
raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
Expand Down
42 changes: 31 additions & 11 deletions test.py
Expand Up @@ -170,9 +170,9 @@ def test_nonpareil_plot():
"--use-conda",
"-F",
"results/a.pdf",
# Test disabled due to bug in nonpareil (#62)
# "results/a.nomodel.pdf",
]
# Test disabled due to bug in nonpareil (#62)
# "results/a.nomodel.pdf",
],
)


Expand Down Expand Up @@ -1080,10 +1080,7 @@ def test_deseq2_deseqdataset():

@skip_if_not_modified
def test_deseq2_wald():
run(
"bio/deseq2/wald",
["snakemake", "--cores", "1", "--use-conda", "dge.tsv"]
)
run("bio/deseq2/wald", ["snakemake", "--cores", "1", "--use-conda", "dge.tsv"])


@skip_if_not_modified
Expand Down Expand Up @@ -2653,7 +2650,29 @@ def test_fasttree():
def test_fgbio_annotate():
run(
"bio/fgbio/annotatebamwithumis",
["snakemake", "--cores", "1", "mapped/a.annotated.bam", "--use-conda", "-F"],
[
"snakemake",
"--cores",
"1",
"mapped/a.annotated.bam",
"--use-conda",
"-F",
],
)


@skip_if_not_modified
def test_fgbio_annotate_two_umi_fastqs():
run(
"bio/fgbio/annotatebamwithumis",
[
"snakemake",
"--cores",
"1",
"mapped/a-a.annotated.bam",
"--use-conda",
"-F",
],
)


Expand Down Expand Up @@ -5688,7 +5707,6 @@ def test_calc_consensus_reads():
)



@skip_if_not_modified
def test_bowtie2_sambamba_meta():
run(
Expand All @@ -5699,8 +5717,8 @@ def test_bowtie2_sambamba_meta():
"2",
"--use-conda",
"-F",
"mapped/Sample1.rmdup.bam.bai"
]
"mapped/Sample1.rmdup.bam.bai",
],
)


Expand All @@ -5726,6 +5744,7 @@ def test_bazam_separated():
],
)


@skip_if_not_modified
def test_ragtag_correction():
run(
Expand Down Expand Up @@ -5785,6 +5804,7 @@ def test_ragtag_merge():
],
)


@skip_if_not_modified
def test_barrnap():
run(
Expand Down

0 comments on commit 101744d

Please sign in to comment.