Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: wrapper for cnvkit batch (#1877)
### Description Wrapper for cnvkit batch which supports calling and reference creation ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
- Loading branch information
Showing
13 changed files
with
1,118 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- nodefaults | ||
dependencies: | ||
- cnvkit =0.9.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
name: cnvkit batch | ||
description: > | ||
Run the CNVkit pipeline on one or more BAM files to call cnv alterations. | ||
Or use it to generate a new reference file | ||
url: https://cnvkit.readthedocs.io/en/stable/pipeline.html | ||
authors: | ||
- Patrik Smeds | ||
input: | ||
- bam: one or more bam files | ||
- reference: copy reference file, when calling samples | ||
- fasta: reference gneome, when building new reference | ||
- antitarget: bed antitarget file, when building new reference | ||
- target: bed target file, when building new reference | ||
- mappability: mappability file, when building new reference | ||
output: | ||
- > | ||
a set of cnvkit generated files, multiple types of cns, cnr and cnn files | ||
- or a new reference file | ||
params: | ||
method: > | ||
Optional, if no value is set cnvkits default value will be use | ||
extra: Any additional arguments to pass | ||
notes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
rule cnvkit_batch_build_reference: | ||
input: | ||
bam=["mapped/a.bam"], | ||
bai=["mapped/a.bam.bai"], | ||
fasta="ref/ref.fa", | ||
antitarget="ref/a.antitarget.bed", | ||
target="ref/a.target.bed", | ||
output: | ||
reference="cnvkit/reference.cnn", | ||
log: | ||
"cnvkit/a.reference.cnn.log", | ||
params: | ||
method='hybrid', # optional | ||
extra="--target-avg-size 10", # optional | ||
resources: | ||
mem_mb=1024, | ||
wrapper: | ||
"master/bio/cnvkit/batch" | ||
|
||
rule cnvkit_batch: | ||
input: | ||
bam=["mapped/a.bam"], | ||
bai=["mapped/a.bam.bai"], | ||
reference="ref/reference.cnn", | ||
output: | ||
antitarget_coverage="cnvkit/a.antitargetcoverage.cnn", | ||
bins="cnvkit/a.bintest.cns", | ||
regions="cnvkit/a.cnr", | ||
segments="cnvkit/a.cns", | ||
segments_called="cnvkit/a.call.cns", | ||
target_coverage="cnvkit/a.targetcoverage.cnn", | ||
log: | ||
"cnvkit/a.antitargetcoverage.cnn.log", | ||
params: | ||
method='hybrid', # optional | ||
extra="", # optional | ||
resources: | ||
mem_mb=1024, | ||
wrapper: | ||
"master/bio/cnvkit/batch" |
Binary file not shown.
Binary file not shown.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
chrM 251 277 LSU-rRNA_Hsa 1000 . | ||
chrY 11170 11213 ALR/Alpha 1000 . | ||
chrY 14339 14355 LSU-rRNA_Hsa 1000 . | ||
chrY 14360 14378 BSR/Beta 1000 . |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
chrM 16571 6 50 51 | ||
chrY 24316 16915 50 51 |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
chromosome start end gene log2 depth gc rmask spread | ||
chrY 11170 11180 ALR/Alpha -0.70752 3 0.7 0.433631 | ||
chrY 11180 11191 ALR/Alpha -0.562765 3 0.545455 0.648245 | ||
chrY 11191 11202 ALR/Alpha -0.5 3 0.454545 0.7413 | ||
chrY 11202 11213 ALR/Alpha -0.644755 3.09091 0.454545 0.526686 | ||
chrY 14339 14347 LSU-rRNA_Hsa -0.5 4 0.625 0.7413 | ||
chrY 14347 14355 LSU-rRNA_Hsa -0.314015 4 0.125 1.01704 | ||
chrY 14360 14369 BSR/Beta -0.5 3.77778 0.222222 0.7413 | ||
chrY 14369 14378 BSR/Beta -0.5 4 1 0.7413 | ||
chrM 251 259 LSU-rRNA_Hsa 0.185575 91.75 0.5 0.275133 | ||
chrM 259 268 LSU-rRNA_Hsa 2.17584 81.6667 0.666667 3.2259 | ||
chrM 268 277 LSU-rRNA_Hsa 2.11547 75.1111 0.555556 3.1364 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
__author__ = "Patrik Smeds" | ||
__copyright__ = "Copyright 2023, Patrik Smeds" | ||
__email__ = "patrik.smeds@scilifelab.uu.se" | ||
__license__ = "MIT" | ||
|
||
import logging | ||
from os import listdir | ||
from os.path import basename | ||
from os.path import dirname | ||
from os.path import join | ||
from tempfile import TemporaryDirectory | ||
import shutil | ||
from snakemake.shell import shell | ||
|
||
log = snakemake.log_fmt_shell(stdout=False, stderr=True) | ||
|
||
input_bam_files = f"{snakemake.input.bam}" | ||
|
||
target = "" | ||
antitarget = "" | ||
ref_fasta = "" | ||
reference_cnv = "" | ||
mappability = "" | ||
|
||
create_reference = snakemake.output.get("reference", False) | ||
if create_reference: | ||
input_bam_files = f"-n {input_bam_files}" | ||
ref_fasta = f"-f {snakemake.input.fasta}" | ||
target = f"-t {snakemake.input.target}" | ||
antitarget = f"-a {snakemake.input.antitarget}" | ||
|
||
if "mappability" in snakemake.input: | ||
mappability = f"-g {snakemake.input.mappability}" | ||
|
||
if len(snakemake.output) > 1: | ||
exception_message = ( | ||
"Only one output expected when creating a reference, multiple output " | ||
f"files defined as output: [{snakemake.output}]" | ||
) | ||
raise Exception(exception_message) | ||
else: | ||
reference_cnv = f"-r {snakemake.input.reference}" | ||
|
||
method = snakemake.params.get("method", "hybrid") | ||
if method: | ||
method = f"-m {method}" | ||
|
||
extra = snakemake.params.get("extra", "") | ||
|
||
with TemporaryDirectory() as tmpdirname: | ||
if create_reference: | ||
output = f"--output-reference {join(tmpdirname, 'reference.cnn')}" | ||
output_list = [snakemake.output.reference] | ||
else: | ||
output_list = [value for key, value in snakemake.output.items()] | ||
output = f"-d {tmpdirname} " | ||
shell( | ||
"(cnvkit.py batch {input_bam_files} " | ||
"{reference_cnv} " | ||
"{method} " | ||
"{ref_fasta} " | ||
"{target} " | ||
"{antitarget} " | ||
"{output} " | ||
"{extra}) {log}" | ||
) | ||
|
||
for output_file in output_list: | ||
filename = basename(output_file) | ||
parent = dirname(output_file) | ||
|
||
try: | ||
shutil.copy2(join(tmpdirname, filename), output_file) | ||
except FileNotFoundError as e: | ||
temp_files = listdir(tmpdirname) | ||
logging.error( | ||
f"Couldn't locate file {basename} possible files are {[basename(f) for f in temp_files]}" | ||
) | ||
raise e |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters