Skip to content

Commit

Permalink
feat: unaligned bam input support for minimap2 alignment (#1863)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

This requests adds the possibility to provide an unaligned BAM file as
input, which will trigger the usage of samtools to convert it into
suitable input for minimap2.

Some Oxford Nanopore Technologies basecallers (e.g.
[dorado](https://github.com/nanoporetech/dorado)) output uBAM by
default, so the possiblity to feed them directly to minimap2 would
prevent the need to add an intermediate rule for processing this kind of
files.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
tdido committed Oct 30, 2023
1 parent 45860bf commit 76280a5
Show file tree
Hide file tree
Showing 6 changed files with 136 additions and 6 deletions.
2 changes: 1 addition & 1 deletion bio/minimap2/aligner/environment.yaml
Expand Up @@ -4,5 +4,5 @@ channels:
- nodefaults
dependencies:
- minimap2 =2.26
- samtools
- samtools =1.18
- snakemake-wrapper-utils =0.6.2
4 changes: 2 additions & 2 deletions bio/minimap2/aligner/meta.yaml
Expand Up @@ -6,11 +6,11 @@ authors:
- Michael Hall
- Filipe G. Vieira
input:
- FASTQ file(s)
- FASTQ file(s) or unaligned BAM file
- reference genome
output:
- SAM/BAM/CRAM file
notes: |
* The `extra` param allows for additional arguments for minimap2.
* The `sort` param allows to enable sorting (if output not PAF), and can be either 'none', 'queryname' or 'coordinate'.
* The `sort_extra` allows for extra arguments for samtools/picard
* The `sort_extra` allows for extra arguments for samtools/picard
67 changes: 67 additions & 0 deletions bio/minimap2/aligner/test/Snakefile
Expand Up @@ -64,3 +64,70 @@ rule minimap2_bam_sorted:
threads: 3
wrapper:
"master/bio/minimap2/aligner"

rule minimap2_ubam_paf:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query="query/reads.bam",
output:
"aligned/{input1}_aln.ubam.paf",
log:
"logs/minimap2/{input1}.ubam.log",
params:
extra="-x map-pb", # optional
sorting="coordinate", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"master/bio/minimap2/aligner"


rule minimap2_ubam_sam:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query="query/reads.bam",
output:
"aligned/{input1}_aln.ubam.sam",
log:
"logs/minimap2/{input1}.ubam.log",
params:
extra="-x map-pb", # optional
sorting="none", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"master/bio/minimap2/aligner"


rule minimap2_ubam_sam_sorted:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query="query/reads.bam",
output:
"aligned/{input1}_aln.sorted.ubam.sam",
log:
"logs/minimap2/{input1}.ubam.log",
params:
extra="-x map-pb", # optional
sorting="queryname", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"master/bio/minimap2/aligner"


rule minimap2_ubam_bam_sorted:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query="query/reads.bam",
output:
"aligned/{input1}_aln.sorted.ubam.bam",
log:
"logs/minimap2/{input1}.ubam.log",
params:
extra="-x map-pb", # optional
sorting="coordinate", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"master/bio/minimap2/aligner"
Binary file added bio/minimap2/aligner/test/query/reads.bam
Binary file not shown.
24 changes: 21 additions & 3 deletions bio/minimap2/aligner/wrapper.py
Expand Up @@ -18,6 +18,24 @@
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

if isinstance(snakemake.input.query, list):
in_ext = infer_out_format(snakemake.input.query[0])
if in_ext == "BAM" and len(snakemake.input.query) > 1:
raise ValueError(f"uBAM input mode only supports a single uBAM file")
else:
in_ext = infer_out_format(snakemake.input.query)

pre_cmd = ""
query = ""
if in_ext == "BAM":
# convert uBAM to fastq keeping all tags
pre_cmd = f'samtools fastq -T "*" {snakemake.input.query} |'
# tell minimap2 to parse tags from fastq header
extra += " -y"
query = "-"
else:
query = snakemake.input.query

out_ext = infer_out_format(snakemake.output[0])

pipe_cmd = ""
Expand All @@ -42,13 +60,13 @@
else:
raise ValueError(f"Unexpected value for params.sort: {sort}")


shell(
"(minimap2"
"({pre_cmd}"
" minimap2"
" -t {snakemake.threads}"
" {extra} "
" {snakemake.input.target}"
" {snakemake.input.query}"
" {query}"
" {pipe_cmd}"
" > {snakemake.output[0]}"
") {log}"
Expand Down
45 changes: 45 additions & 0 deletions test.py
Expand Up @@ -3333,6 +3333,51 @@ def test_minimap2_aligner_bam_sorted():
],
)

@skip_if_not_modified
def test_minimap2_aligner_ubam_paf():
run(
"bio/minimap2/aligner",
["snakemake", "--cores", "1", "aligned/genome_aln.ubam.paf", "--use-conda", "-F"],
)


@skip_if_not_modified
def test_minimap2_aligner_ubam_sam():
run(
"bio/minimap2/aligner",
["snakemake", "--cores", "1", "aligned/genome_aln.ubam.sam", "--use-conda", "-F"],
)


@skip_if_not_modified
def test_minimap2_aligner_ubam_sam_sorted():
run(
"bio/minimap2/aligner",
[
"snakemake",
"--cores",
"1",
"aligned/genome_aln.sorted.ubam.sam",
"--use-conda",
"-F",
],
)


@skip_if_not_modified
def test_minimap2_aligner_ubam_bam_sorted():
run(
"bio/minimap2/aligner",
[
"snakemake",
"--cores",
"1",
"aligned/genome_aln.sorted.ubam.bam",
"--use-conda",
"-F",
],
)


@skip_if_not_modified
def test_minimap2_index():
Expand Down

0 comments on commit 76280a5

Please sign in to comment.