Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Pyroe make-spliced+unspliced (#1290)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> ### Description This PR adds pyroe make-splice+unspliced tool. ### QC <!-- Make sure that you can tick the boxes below. --> * [X] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: tdayris <tdayris@gustaveroussy.fr> Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com> Co-authored-by: Felix Mölder <felix.moelder@uni-due.de> Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
- Loading branch information
1 parent
30c2a72
commit 96a2cbb
Showing
10 changed files
with
160 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- nodefaults | ||
dependencies: | ||
- pyroe=0.9.1 | ||
- bedtools=2.30.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
name: pyroe make-spliced+unspliced | ||
url: https://pyroe.readthedocs.io/en/latest/building_splici_index.html#preparing-a-spliced-unspliced-transcriptome-reference | ||
description: > | ||
Build spliceu reference files for Alevin-fry. The spliceu (the spliced + unspliced) transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene. | ||
author: | ||
- Thibault Dayris | ||
input: | ||
- gtf: Path to the genome annotation (GTF formatted) | ||
- fasta: Path to the genome sequence (Fasta formatted) | ||
- spliced: Optional path to additional spliced sequences (Fasta formatted) | ||
- unspliced: Optional path to unspliced sequences (Fasta formatted) | ||
output: | ||
- fasta: Path to spliced+unspliced sequences (Fasta formatted) | ||
- gene_id_to_name: Path to a TSV formatted text file containing gene_id <-> gene_name correspondence | ||
- t2g_3col: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence | ||
- t2g: Path to a TSV formatted text file containing the transcript_id <-> gene_name | ||
- g2g: Path to a TSV formatted text file containing the gene_id <-> gene_name | ||
params: | ||
- extra: Optional parameters to be passed to pyroe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
|
||
rule test_pyroe_makesplicedunspliced: | ||
input: | ||
fasta="genome.fasta", | ||
gtf="annotation.gtf", | ||
spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA) | ||
unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA) | ||
output: | ||
gene_id_to_name="gene_id_to_name.tsv", | ||
fasta="spliceu.fa", | ||
g2g="spliceu_g2g.tsv", | ||
t2g_3col="spliceu_t2g_3col.tsv", | ||
t2g="spliceu_t2g.tsv", | ||
threads: 1 | ||
log: | ||
"logs/pyroe.log", | ||
params: | ||
extra="", # Optional parameters | ||
wrapper: | ||
"master/bio/pyroe/makeunspliceunspliced/" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
##gff-version 2 | ||
##source-version rtracklayer 1.52.1 | ||
##date 2021-09-14 | ||
chr1 rtracklayer exon 1 2 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.1"; exon_id "E1" | ||
chr1 rtracklayer exon 36 45 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.1"; exon_id "E2" | ||
chr1 rtracklayer exon 71 80 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.1"; exon_id "E3" | ||
chr1 rtracklayer exon 46 55 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.2"; exon_id "E4" | ||
chr1 rtracklayer exon 91 100 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.2"; exon_id "E5" | ||
chr1 rtracklayer exon 121 130 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.3"; exon_id "E6" | ||
chr1 rtracklayer exon 156 160 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.3"; exon_id "E7" | ||
chr1 rtracklayer exon 191 200 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.3"; exon_id "E8" | ||
chr1 rtracklayer transcript 1 80 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.1"; | ||
chr1 rtracklayer transcript 46 100 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.2"; | ||
chr1 rtracklayer transcript 121 200 . + . gene_id "g1"; gene_name "g1"; transcript_id "tx1.3"; | ||
chr1 rtracklayer gene 1 200 . + . gene_id "g1"; gene_name "g1"; | ||
chr2 rtracklayer exon 1 2 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.1"; exon_id "E9" | ||
chr2 rtracklayer exon 36 45 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.1"; exon_id "E10" | ||
chr2 rtracklayer exon 71 80 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.1"; exon_id "E11" | ||
chr2 rtracklayer exon 46 55 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.2"; exon_id "E12" | ||
chr2 rtracklayer exon 91 100 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.2"; exon_id "E13" | ||
chr2 rtracklayer exon 121 130 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.3"; exon_id "E14" | ||
chr2 rtracklayer exon 156 160 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.3"; exon_id "E15" | ||
chr2 rtracklayer exon 191 200 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.3"; exon_id "E16" | ||
chr2 rtracklayer transcript 1 80 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.1"; | ||
chr2 rtracklayer transcript 46 100 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.2"; | ||
chr2 rtracklayer transcript 121 200 . - . gene_id "g2"; gene_name "g2"; transcript_id "tx2.3"; | ||
chr2 rtracklayer gene 1 200 . - . gene_id "g2"; gene_name "g2"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>ExtraSpliced | ||
ATATATATATATATATATATATATATATATATATATATAT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>ExtraUnspliced | ||
CGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCG |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
>chr1 | ||
TTAACATTCGCTGGGGGAGATGACGAGACTAGCCGCCGCGTGGTCCTGCCGCATTATACGTGTTCAAGCGCCTACGTGGG | ||
TTGGGCAACCCGTGCCTATGGAGGCATGGACAAATTAGGTTCAACTTCAGCTACGTACGAGACCTAGAGGTAATAAGGGT | ||
ATTTTACTCGGAGCATGTTTCAGTACGAACGTTAGATATC | ||
>chr2 | ||
CTATCGAAGTGGAATCTTGAAGAGCCCATCGGTTAAGGTCTCTCCAATGTCCAGCCTATTCTATGGCACGGCAGACCCGT | ||
TGTGCATCCACAGTGATAACTTACTTGGGCTCTTAATAGAGGAGTGTTGCCATTTTATCGGCTTGCACTCCAATTAGCAC | ||
CAAGTGCCGTTATTGGGGTATTGCACTCATCAATAGCGTG |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
chr1 | ||
203 7 81 82 | ||
chr2 | ||
203 220 81 82 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
__author__ = "Thibault Dayris" | ||
__copyright__ = "Copyright 2023, Thibault Dayris" | ||
__email__ = "thibault.dayris@gustaveroussy.fr" | ||
__license__ = "MIT" | ||
|
||
|
||
from tempfile import TemporaryDirectory | ||
from snakemake.shell import shell | ||
|
||
|
||
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True) | ||
extra = snakemake.params.get("extra", "") | ||
|
||
spliced = snakemake.input.get("spliced", "") | ||
if spliced: | ||
spliced = "--extra-spliced " + spliced | ||
|
||
|
||
unspliced = snakemake.input.get("unspliced", "") | ||
if unspliced: | ||
unspliced = "--extra-unspliced " + unspliced | ||
|
||
|
||
with TemporaryDirectory() as tempdir: | ||
shell( | ||
"pyroe make-spliced+unspliced " | ||
"{extra} {spliced} " | ||
"{unspliced} " | ||
"{snakemake.input.fasta} " | ||
"{snakemake.input.gtf} " | ||
"{tempdir} " | ||
"{log}" | ||
) | ||
|
||
if snakemake.output.get("fasta", False): | ||
shell("mv --verbose {tempdir}/spliceu.fa {snakemake.output.fasta} {log}") | ||
|
||
if snakemake.output.get("gene_id_to_name", False): | ||
shell( | ||
"mv --verbose " | ||
"{tempdir}/gene_id_to_name.tsv " | ||
"{snakemake.output.gene_id_to_name} {log}" | ||
) | ||
|
||
if snakemake.output.get("t2g_3col", False): | ||
shell( | ||
"mv --verbose " | ||
"{tempdir}/spliceu_t2g_3col.tsv " | ||
"{snakemake.output.t2g_3col} {log} " | ||
) | ||
|
||
if snakemake.output.get("t2g", False): | ||
shell("mv --verbose {tempdir}/spliceu_t2g.tsv {snakemake.output.t2g} {log} ") | ||
|
||
if snakemake.output.get("g2g", False): | ||
shell("mv --verbose {tempdir}/spliceu_g2g.tsv {snakemake.output.g2g} {log} ") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters