Skip to content

Commit

Permalink
feat: wrapper for cnvkit batch (#1877)
Browse files Browse the repository at this point in the history
### Description

Wrapper for cnvkit batch which supports calling and reference creation 

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
  • Loading branch information
Smeds and fgvieira committed Oct 23, 2023
1 parent db073db commit 40bfc5d
Show file tree
Hide file tree
Showing 13 changed files with 1,118 additions and 0 deletions.
6 changes: 6 additions & 0 deletions bio/cnvkit/batch/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- cnvkit =0.9.10
23 changes: 23 additions & 0 deletions bio/cnvkit/batch/meta.yaml
@@ -0,0 +1,23 @@
name: cnvkit batch
description: >
Run the CNVkit pipeline on one or more BAM files to call cnv alterations.
Or use it to generate a new reference file
url: https://cnvkit.readthedocs.io/en/stable/pipeline.html
authors:
- Patrik Smeds
input:
- bam: one or more bam files
- reference: copy reference file, when calling samples
- fasta: reference gneome, when building new reference
- antitarget: bed antitarget file, when building new reference
- target: bed target file, when building new reference
- mappability: mappability file, when building new reference
output:
- >
a set of cnvkit generated files, multiple types of cns, cnr and cnn files
- or a new reference file
params:
method: >
Optional, if no value is set cnvkits default value will be use
extra: Any additional arguments to pass
notes:
40 changes: 40 additions & 0 deletions bio/cnvkit/batch/test/Snakefile
@@ -0,0 +1,40 @@
rule cnvkit_batch_build_reference:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
fasta="ref/ref.fa",
antitarget="ref/a.antitarget.bed",
target="ref/a.target.bed",
output:
reference="cnvkit/reference.cnn",
log:
"cnvkit/a.reference.cnn.log",
params:
method='hybrid', # optional
extra="--target-avg-size 10", # optional
resources:
mem_mb=1024,
wrapper:
"master/bio/cnvkit/batch"

rule cnvkit_batch:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
reference="ref/reference.cnn",
output:
antitarget_coverage="cnvkit/a.antitargetcoverage.cnn",
bins="cnvkit/a.bintest.cns",
regions="cnvkit/a.cnr",
segments="cnvkit/a.cns",
segments_called="cnvkit/a.call.cns",
target_coverage="cnvkit/a.targetcoverage.cnn",
log:
"cnvkit/a.antitargetcoverage.cnn.log",
params:
method='hybrid', # optional
extra="", # optional
resources:
mem_mb=1024,
wrapper:
"master/bio/cnvkit/batch"
Binary file added bio/cnvkit/batch/test/mapped/a.bam
Binary file not shown.
Binary file added bio/cnvkit/batch/test/mapped/a.bam.bai
Binary file not shown.
Empty file.
4 changes: 4 additions & 0 deletions bio/cnvkit/batch/test/ref/a.target.bed
@@ -0,0 +1,4 @@
chrM 251 277 LSU-rRNA_Hsa 1000 .
chrY 11170 11213 ALR/Alpha 1000 .
chrY 14339 14355 LSU-rRNA_Hsa 1000 .
chrY 14360 14378 BSR/Beta 1000 .
821 changes: 821 additions & 0 deletions bio/cnvkit/batch/test/ref/ref.fa

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions bio/cnvkit/batch/test/ref/ref.fa.fai
@@ -0,0 +1,2 @@
chrM 16571 6 50 51
chrY 24316 16915 50 51
100 changes: 100 additions & 0 deletions bio/cnvkit/batch/test/ref/ref.txt

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions bio/cnvkit/batch/test/ref/reference.cnn
@@ -0,0 +1,12 @@
chromosome start end gene log2 depth gc rmask spread
chrY 11170 11180 ALR/Alpha -0.70752 3 0.7 0.433631
chrY 11180 11191 ALR/Alpha -0.562765 3 0.545455 0.648245
chrY 11191 11202 ALR/Alpha -0.5 3 0.454545 0.7413
chrY 11202 11213 ALR/Alpha -0.644755 3.09091 0.454545 0.526686
chrY 14339 14347 LSU-rRNA_Hsa -0.5 4 0.625 0.7413
chrY 14347 14355 LSU-rRNA_Hsa -0.314015 4 0.125 1.01704
chrY 14360 14369 BSR/Beta -0.5 3.77778 0.222222 0.7413
chrY 14369 14378 BSR/Beta -0.5 4 1 0.7413
chrM 251 259 LSU-rRNA_Hsa 0.185575 91.75 0.5 0.275133
chrM 259 268 LSU-rRNA_Hsa 2.17584 81.6667 0.666667 3.2259
chrM 268 277 LSU-rRNA_Hsa 2.11547 75.1111 0.555556 3.1364
79 changes: 79 additions & 0 deletions bio/cnvkit/batch/wrapper.py
@@ -0,0 +1,79 @@
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"

import logging
from os import listdir
from os.path import basename
from os.path import dirname
from os.path import join
from tempfile import TemporaryDirectory
import shutil
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

input_bam_files = f"{snakemake.input.bam}"

target = ""
antitarget = ""
ref_fasta = ""
reference_cnv = ""
mappability = ""

create_reference = snakemake.output.get("reference", False)
if create_reference:
input_bam_files = f"-n {input_bam_files}"
ref_fasta = f"-f {snakemake.input.fasta}"
target = f"-t {snakemake.input.target}"
antitarget = f"-a {snakemake.input.antitarget}"

if "mappability" in snakemake.input:
mappability = f"-g {snakemake.input.mappability}"

if len(snakemake.output) > 1:
exception_message = (
"Only one output expected when creating a reference, multiple output "
f"files defined as output: [{snakemake.output}]"
)
raise Exception(exception_message)
else:
reference_cnv = f"-r {snakemake.input.reference}"

method = snakemake.params.get("method", "hybrid")
if method:
method = f"-m {method}"

extra = snakemake.params.get("extra", "")

with TemporaryDirectory() as tmpdirname:
if create_reference:
output = f"--output-reference {join(tmpdirname, 'reference.cnn')}"
output_list = [snakemake.output.reference]
else:
output_list = [value for key, value in snakemake.output.items()]
output = f"-d {tmpdirname} "
shell(
"(cnvkit.py batch {input_bam_files} "
"{reference_cnv} "
"{method} "
"{ref_fasta} "
"{target} "
"{antitarget} "
"{output} "
"{extra}) {log}"
)

for output_file in output_list:
filename = basename(output_file)
parent = dirname(output_file)

try:
shutil.copy2(join(tmpdirname, filename), output_file)
except FileNotFoundError as e:
temp_files = listdir(tmpdirname)
logging.error(
f"Couldn't locate file {basename} possible files are {[basename(f) for f in temp_files]}"
)
raise e
31 changes: 31 additions & 0 deletions test.py
Expand Up @@ -1180,6 +1180,19 @@ def test_bwa_mapping_meta():
],
)

@skip_if_not_modified
def test_cnvkit_batch_create_reference():
run(
"bio/cnvkit/batch",
[
"snakemake",
"--cores",
"1",
"--use-conda",
"cnvkit/reference.cnn",
],
)

@skip_if_not_modified
def test_cnvkit_call():
run(
Expand Down Expand Up @@ -1249,6 +1262,24 @@ def test_cnvkit_antitarget():
],
)

@skip_if_not_modified
def test_cnvkit_batch():
run(
"bio/cnvkit/batch",
[
"snakemake",
"--cores",
"1",
"--use-conda",
"cnvkit/a.antitargetcoverage.cnn",
"cnvkit/a.bintest.cns",
"cnvkit/a.cnr",
"cnvkit/a.cns",
"cnvkit/a.call.cns",
"cnvkit/a.targetcoverage.cnn",
],
)

@skip_if_not_modified
def test_cnvkit_target():
run(
Expand Down

0 comments on commit 40bfc5d

Please sign in to comment.