Skip to content

Commit

Permalink
feat: pyTMB (#2739)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

<!-- Add a description of your PR here-->
This PR adds [`pyTMB`](https://github.com/bioinfo-pf-curie/TMB) to the
list of available wrappers.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running
`snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: tdayris <tdayris@gustaveroussy.fr>
Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: Felix Mölder <felix.moelder@uni-due.de>
Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
  • Loading branch information
8 people committed Mar 27, 2024
1 parent af63b5b commit 0ffbed9
Show file tree
Hide file tree
Showing 10 changed files with 312 additions and 0 deletions.
63 changes: 63 additions & 0 deletions bio/tmb/pytmb/environment.linux-64.pin.txt
@@ -0,0 +1,63 @@
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2024.2.2-hbcca054_0.conda#2f4327a1cbe7f022401b236e915a5fef
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda#7aca3059a1729aa76c597603f10b0dd3
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-13.2.0-h7e041cc_5.conda#f6f6600d18a4047b54f803cf708b868a
https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.10-4_cp310.conda#26322ec5d7712c3ded99dd656142b8ce
https://conda.anaconda.org/conda-forge/noarch/tzdata-2024a-h0c530f3_0.conda#161081fc7cec0bfda0d86d7cb595f8d8
https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h807b86a_5.conda#d211c42b9ce49aee3734fdc828731689
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h807b86a_5.conda#d4ff227c46917d3b4565302a2bbb276b
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-hd590300_5.conda#69b8b6202a07720f448be700e300ccf4
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.27.0-hd590300_0.conda#f6afff0e9ee08d2f1b897881a4f38cdb
https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2#30186d27e2c9fa62b45fb1476b7200e3
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.18-h0b41bf4_0.conda#6aa9c9de5542ecb07fdda9ca626252d8
https://conda.anaconda.org/conda-forge/linux-64/libev-4.33-hd590300_2.conda#172bf1cd1ff8629f2b1179945ed45055
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2#d645c6d2ac96843a2bfaccd2d62b3ac3
https://conda.anaconda.org/conda-forge/linux-64/libgfortran5-13.2.0-ha4646dd_5.conda#7a6bd7a12a4bd359e2afe6c0fa1acace
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hd590300_0.conda#30fd6e37fe21f86f4bd26d6ee73eeec7
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda#40b61aab5c7ba9ff276c41cfffe6b80b
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.13-hd590300_5.conda#f36c115f1ee199da648e0597ec2047ad
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.4-h59595ed_2.conda#7dbaa197d7ba6032caf7ae7f32c1efa0
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.2.1-hd590300_0.conda#51a753e64a3027bd7e23a189b1f6e91e
https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2#2161070d867d1b1204ea749c8eec4ef0
https://conda.anaconda.org/conda-forge/linux-64/yaml-0.2.5-h7f98852_2.tar.bz2#4cb3ad778ec2d5a7acbdf254eb1c42ae
https://conda.anaconda.org/bioconda/linux-64/bedtools-2.31.1-hf5e1c6e_1.tar.bz2#2066287e826a2ff469fa0b62b24b6059
https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2#4d331e44109e3f0e19b4cb8f9b82f3e1
https://conda.anaconda.org/conda-forge/linux-64/libgfortran-ng-13.2.0-h69a702a_5.conda#e73e9cfd1191783392131e6238bdb3e9
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.58.0-h47da74e_1.conda#700ac6ea6d53d5510591c4344d5c989a
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.45.1-h2797004_0.conda#fc4ccadfbf6d4784de88c41704792562
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.0-h0841786_0.conda#1f5a58e686b13bcfde88b93f547d23fe
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8228510_1.conda#47d31b792659ce70f470b5c82fdfb7a4
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.5-hfc55251_0.conda#04b88013080254850d6c01ed54810589
https://conda.anaconda.org/conda-forge/linux-64/krb5-1.21.2-h659d440_0.conda#cd95826dbd331ed1be26bdf401432844
https://conda.anaconda.org/conda-forge/linux-64/libopenblas-0.3.26-pthreads_h413a1c8_0.conda#760ae35415f5ba8b15d09df5afe8b23a
https://conda.anaconda.org/conda-forge/linux-64/python-3.10.13-hd12c33a_1_cpython.conda#ed38140af93f81319ebc472fbcf16cca
https://conda.anaconda.org/conda-forge/noarch/click-8.1.7-unix_pyh707e725_0.conda#f3ad426304898027fc619827ff428eca
https://conda.anaconda.org/conda-forge/noarch/humanfriendly-10.0-pyhd8ed1ab_6.conda#2ed1fe4b9079da97c44cfe9c2e5078fd
https://conda.anaconda.org/conda-forge/linux-64/libblas-3.9.0-21_linux64_openblas.conda#0ac9f44fc096772b0aa092119b00c3ca
https://conda.anaconda.org/conda-forge/linux-64/libcurl-8.5.0-hca28451_0.conda#7144d5a828e2cae218e0e3c98d8a0aeb
https://conda.anaconda.org/conda-forge/noarch/python-tzdata-2024.1-pyhd8ed1ab_0.conda#98206ea9954216ee7540f0c773f2104d
https://conda.anaconda.org/conda-forge/noarch/pytz-2024.1-pyhd8ed1ab_0.conda#3eeeeb9e4827ace8c0c1419c85d590ad
https://conda.anaconda.org/conda-forge/linux-64/pyyaml-6.0.1-py310h2372a71_1.conda#bb010e368de4940771368bc3dc4c63e7
https://conda.anaconda.org/conda-forge/noarch/setuptools-69.1.1-pyhd8ed1ab_0.conda#576de899521b7d43674ba3ef6eae9142
https://conda.anaconda.org/conda-forge/noarch/six-1.16.0-pyh6c4a22f_0.tar.bz2#e5f25f8dbc060e9a8d912e432202afc2
https://conda.anaconda.org/conda-forge/noarch/wheel-0.42.0-pyhd8ed1ab_0.conda#1cdea58981c5cbc17b51973bcaddcea7
https://conda.anaconda.org/conda-forge/noarch/coloredlogs-15.0.1-pyhd8ed1ab_3.tar.bz2#7b4fc18b7f66382257c45424eaf81935
https://conda.anaconda.org/bioconda/linux-64/htslib-1.19.1-h81da01d_2.tar.bz2#ad57eedd99d6722b2f00a8f7d0d71e2a
https://conda.anaconda.org/conda-forge/linux-64/libcblas-3.9.0-21_linux64_openblas.conda#4a3816d06451c4946e2db26b86472cb6
https://conda.anaconda.org/conda-forge/linux-64/liblapack-3.9.0-21_linux64_openblas.conda#1a42f305615c3867684e049e85927531
https://conda.anaconda.org/conda-forge/noarch/pip-24.0-pyhd8ed1ab_0.conda#f586ac1e56c8638b64f9c8122a7b8a67
https://conda.anaconda.org/bioconda/linux-64/pysam-0.22.0-py310h41dec4a_1.tar.bz2#19fdb9301a6debbb7fe9836670e3feb7
https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.9.0-pyhd8ed1ab_0.conda#2cf4264fffb9e6eff6031c5b6884d61c
https://conda.anaconda.org/bioconda/linux-64/mosdepth-0.3.6-hd299d5a_0.tar.bz2#d600959c8132348d3a6994e2aa3a2134
https://conda.anaconda.org/conda-forge/linux-64/numpy-1.26.4-py310hb13e2d6_0.conda#6593de64c935768b6bad3e19b3e978be
https://conda.anaconda.org/bioconda/linux-64/cyvcf2-0.30.28-py310hcf1fb4a_0.tar.bz2#232a76b24d3c3b44aa4e88d84a73872e
https://conda.anaconda.org/conda-forge/linux-64/pandas-2.2.1-py310hcc13569_0.conda#cf5d315e3601a6a2931f63aa9a84dc40
https://conda.anaconda.org/bioconda/linux-64/pybedtools-0.9.1-py310h2b6aa90_0.tar.bz2#e561264a083c7b5a2b2290008460c9dd
https://conda.anaconda.org/bioconda/noarch/tmb-1.3.0-pyh5e36f6f_0.tar.bz2#ef5e806d5a3f48d4568870df9c6ae7e1
6 changes: 6 additions & 0 deletions bio/tmb/pytmb/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- tmb=1.3.0
18 changes: 18 additions & 0 deletions bio/tmb/pytmb/meta.yaml
@@ -0,0 +1,18 @@
name: "pyTMB.py"
description: Calculate a Tumor Mutational Burden (TMB) score from a VCF file
url: "https://github.com/bioinfo-pf-curie/TMB?tab=readme-ov-file#tumor-mutational-burden"
authors:
- "Thibault Dayris"
input:
- vcf: Path to input variants (`vcf`, `vcf.gz`, or `bcf` formatted)
- db_config: Path to database config file (`yaml` formatted)
- var_config: Path to variant config file (`yaml` formatted)
- bed: Path to intervals file to compute effective genome size (`bed` formatted)
output:
- res: Path to TMB results
- vcf: Optional path to variants considered for TMB calculation
params:
- extra: Optional parameters provided to `pyTMB.py`, besides `-i`, `--dbConfig`, `--varConfig`, `--bed`, or `--export`
note: |
This wrapper executes the whole command in a temporary directory. The use of `shadow` directive
in the Snakemake rule would be redundant.
15 changes: 15 additions & 0 deletions bio/tmb/pytmb/test/Snakefile
@@ -0,0 +1,15 @@
rule test_pytmb:
input:
vcf="sample.bcf",
db_config="dbconfig.yaml",
var_config="varconfig.yaml",
bed="regions.bed",
output:
res="tmb.txt",
vcf="tmp.vcf",
log:
"pytmb.log",
params:
extra="--verbose",
wrapper:
"master/bio/tmb/pytmb"
115 changes: 115 additions & 0 deletions bio/tmb/pytmb/test/dbconfig.yaml
@@ -0,0 +1,115 @@
## Describe the fields
## For definition, provide the expected key:values
## Note that several keys/values can be defined

###############################################
## SnpEff Parsing

## Tags
tag: 'ANN'
sep: '&'

## Annotation flags

isCoding:
1:
- chromosome_number_variation
- coding_sequence_variant
- conservative_inframe_deletion
- conservative_inframe_insertion
- disruptive_inframe_deletion
- disruptive_inframe_insertion
- exon_loss
- exon_loss_variant
- exon_variant
- frameshift_variant
- gene_variant
- initiator_codon_variant
- missense_variant
- rare_amino_acid_variant
- splice_acceptor_variant
- splice_donor_variant
- splice_region_variant
- start_lost
- start_retained
- stop_gained
- stop_lost
- stop_retained_variant
- synonymous_variant
- transcript_ablation
- transcript_amplification
- transcript_variant

isNonCoding:
1:
- 3_prime_UTR_truncation
- 3_prime_UTR_variant
- 5_prime_UTR_premature_start_codon_gain_variant
- 5_prime_UTR_truncation
- 5_prime_UTR_variant
- conserved_intergenic_variant
- conserved_intron_variant
- downstream_gene_variant
- feature_elongation
- feature_truncation
- intergenic_region
- intragenic_variant
- intron_variant
- mature_miRNA_variant
- miRNA
- NMD_transcript_variant
- non_coding_transcript_exon_variant
- non_coding_transcript_variant
- regulatory_region_ablation
- regulatory_region_amplification
- regulatory_region_variant
- TF_binding_site_variant
- TFBS_ablation
- TFBS_amplification
- upstream_gene_variant

isSplicing:
1:
- splice_donor_variant
- splice_acceptor_variant
- splice_region_variant

isSynonymous:
1:
- start_retained_variant
- stop_retained_variant
- synonymous_variant

isNonSynonymous:
1:
- frameshift_variant
- missense_variant
- rare_amino_acid_variant
- splice_acceptor_variant
- splice_donor_variant
- splice_region_variant
- start_lost
- stop_gained
- stop_lost

## Databases
cancerDb:
cosmic:
- cosmic_coding_ID
- cosmic_noncoding_ID

polymDb:
1k:
- kg_AMR_AF
- kg_AFR_AF
- kg_EAS_AF
- kg_EUR_AF
- kg_SAS_AF
- KG_AF_GLOBAL

gnomad:
- gnomAD_genomes_AF
- AF

esp:
- ESP_AF_GLOBAL
1 change: 1 addition & 0 deletions bio/tmb/pytmb/test/regions.bed
@@ -0,0 +1 @@
18 0 80373285
Binary file added bio/tmb/pytmb/test/sample.bcf
Binary file not shown.
10 changes: 10 additions & 0 deletions bio/tmb/pytmb/test/varconfig.yaml
@@ -0,0 +1,10 @@
## Describe the fields
## For definition, provide the expected key:values
## Note that several keys/values can be defined
##
###############################################

freq: 'AF'
depth: 'DP'
altDepth: 'AD'
maxVaf: '1'
77 changes: 77 additions & 0 deletions bio/tmb/pytmb/wrapper.py
@@ -0,0 +1,77 @@
# coding: utf-8

"""Snakemake wrapper for pyTMB.py"""

__author__ = "Thibault Dayris"
__mail__ = "thibault.dayris@gustaveroussy.fr"
__copyright__ = "Copyright 2024, Thibault Dayris"
__license__ = "MIT"

from os.path import basename
from re import sub
from snakemake import shell
from tempfile import TemporaryDirectory


extra = snakemake.params.get("extra", "")
ln_extra = "--symbolic --force --relative --verbose"

out_vcf = snakemake.output.get("vcf", "")
if out_vcf:
extra += " --export"

# pyTMB creates an exported VCF file which name/prefix
# is predictible, but not editable. It is based on input
# vcf file name.
# It was chosen to handle this issue in the wrapper itself,
# rather than expecting user to define `shadow` directive
# in the Snakemake rule.
with TemporaryDirectory() as tempdir:
# Linking all input files in the creates temporary directory
vcf_link_path = f"{tempdir}/{basename(snakemake.input.vcf)}"
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell("ln {ln_extra} {snakemake.input.vcf} {vcf_link_path} {log}")

db_config = snakemake.input.get("db_config", "")
if db_config:
db_link_path = f"{tempdir}/{basename(db_config)}"
shell("ln {ln_extra} {db_config} {db_link_path} {log}")
db_config = f"--dbConfig {db_link_path}"

var_config = snakemake.input.get("var_config", "")
if var_config:
var_link_path = f"{tempdir}/{basename(var_config)}"
shell("ln {ln_extra} {var_config} {var_link_path} {log}")
var_config = f"--varConfig {var_link_path}"

bed = snakemake.input.get("bed", "")
if bed:
bed_link_path = f"{tempdir}/{basename(bed)}"
shell("ln {ln_extra} {bed} {bed_link_path} {log}")
bed = f"--bed {bed_link_path}"

res_link_name = f"{tempdir}/{basename(snakemake.output.res)}"

# Running pyTMB on symlinked files, after moving
# into the temporary directory in order to let
# the exported VCF file be there.
# The exported VCF file is created in working directory.
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
shell(
"cd {tempdir} && "
"pyTMB.py {extra} "
"{db_config} {var_config} {bed} "
"--vcf {vcf_link_path} "
"> {res_link_name} "
"{log} && "
"cd - "
)

# Moving the main result file
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell("mv --verbose {res_link_name} {snakemake.output.res} {log}")

# Moving the optional exported VCF file
if out_vcf:
prefix = sub("\.(v|b)cf(.gz)?", "", f"{vcf_link_path}")
shell("mv --verbose {prefix}_export.vcf {out_vcf} {log}")
7 changes: 7 additions & 0 deletions test.py
Expand Up @@ -6432,3 +6432,10 @@ def test_sortmerna_se():
"-F",
],
)

@skip_if_not_modified
def test_tmb_pytmb():
run(
"bio/tmb/pytmb",
["snakemake", "--cores", "1", "--use-conda", "-F"],
)

0 comments on commit 0ffbed9

Please sign in to comment.