Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> ### Description This PR adds DESeq2 wald test function and its very common outputs. ### QC <!-- Make sure that you can tick the boxes below. --> * [X] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: tdayris <tdayris@gustaveroussy.fr> Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com> Co-authored-by: Felix Mölder <felix.moelder@uni-due.de> Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de> Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
- Loading branch information
1 parent
86a08de
commit e0d2831
Showing
6 changed files
with
311 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- nodefaults | ||
dependencies: | ||
- bioconductor-deseq2=1.38.0 | ||
- bioconductor-biocparallel=1.32.5 | ||
- r-ashr=2.2_54 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
name: deseq2 | ||
description: | | ||
Call differentially expressed genes with DESeq2 | ||
url: https://bioconductor.org/packages/3.16/bioc/html/DESeq2.html | ||
authors: | ||
- Thibault Dayris | ||
input: | ||
- dds: Path to RDS-formatted DESeq2-object | ||
output: | ||
- wald_rds: Optional path to wald test results (RDS formatted) | ||
- wald_tsv: Optional path to wald test results (TSV formatted). Required optional parameter `contrast` (see below) | ||
- deseq2_result_dir: Optional path to a directory that shall contain all DESeq2 results for each comparison (each file is TSV formatted) | ||
- normalized_counts_table: Optional path to normalized counts (TSV formatted) | ||
- normalized_counts_rds: Optional path to normalized counts (RDS formatted) | ||
params: | ||
- deseq_extra: Optional parameters provided to the function `DESeq()` | ||
- schrink_extra: Optional parameters provided to the function `lfSchrink()` | ||
- results_extra: Optional parameters provided to the function `result()` | ||
- contrast: List of characters. See notes below. | ||
note: | | ||
* About `contrast` parameter. DESeq2 defines the following: this argument specifies what comparison to extract from the object to build a results table. one of either: | ||
* a character list with exactly 3 elements: (1) the name of a factor in the design formula, (2) the name of the numerator level for the fold change (aka, tested condition), and (3) the name of the denominator level for the fold change (aka, reference condition). | ||
* a character list with exactly 2 elements: (1) the list of level names of the fold changes for the numerator (aka, tested condition), and (2) the list of level names of the fold changes for the denominator (aka, reference condition). these names should be elements of resultsNames(object). More general case, can be to combine interaction terms and main effects | ||
* a numeric contrast vector with one element for each element in `resultsNames(object)`. Most general case | ||
In case `contrast` optional parameter is not a list of strings of length 3 as described above, then it is *not* possible to save a single `wald_tsv` result. Please use `deseq2_result_dir` output to access you result in the given directory in such case. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
rule test_deseq2_wald: | ||
input: | ||
dds="dds.RDS", | ||
output: | ||
wald_rds="wald.RDS", | ||
wald_tsv="dge.tsv", | ||
deseq2_result_dir=directory("deseq_results"), | ||
normalized_counts_table="counts.tsv", | ||
normalized_counts_rds="counts.RDS", | ||
params: | ||
deseq_extra="", | ||
shrink_extra="", | ||
results_extra="", | ||
contrast=["condition", "A", "B"], | ||
threads: 1 | ||
log: | ||
"logs/deseq2.log", | ||
wrapper: | ||
"master/bio/deseq2/wald" |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,251 @@ | ||
# This script takes a deseq2 dataset object, performs | ||
# a DESeq2 wald test, and saves results as requested by user | ||
|
||
# __author__ = "Thibault Dayris" | ||
# __copyright__ = "Copyright 2023, Thibault Dayris" | ||
# __email__ = "thibault.dayris@gustaveroussy.fr" | ||
# __license__ = "MIT" | ||
|
||
# Sink the stderr and stdout to the snakemake log file | ||
# https://stackoverflow.com/a/48173272 | ||
log_file <- base::file(description = snakemake@log[[1]], open = "wt") | ||
base::sink(file = log_file) | ||
base::sink(file = log_file, type = "message") | ||
|
||
# Loading libraries (order matters) | ||
base::library(package = "BiocParallel", character.only = TRUE) | ||
base::library(package = "SummarizedExperiment", character.only = TRUE) | ||
base::library(package = "DESeq2", character.only = TRUE) | ||
base::library(package = "ashr", character.only = TRUE) | ||
|
||
# Function to handle optional user-defined parameters | ||
# and still follow R syntax | ||
add_extra <- function(wrapper_extra, snakemake_param_name) { | ||
if (snakemake_param_name %in% base::names(snakemake@params)) { | ||
# Case user provides snakemake_param_name in snakemake rule | ||
user_param <- snakemake@params[[snakemake_param_name]] | ||
|
||
param_is_empty <- user_param == "" | ||
param_is_character <- inherits(x = user_param, what = "charcter") | ||
if ((! param_is_empty) && (param_is_character)) { | ||
# Case user do not provide an empty string | ||
# (R does not like trailing commas at the end | ||
# of a function call) | ||
wrapper_extra <- base::paste( | ||
wrapper_extra, | ||
user_param, | ||
sep = ", " | ||
) | ||
} # Nothing to do if user provides an empty / NULL parameter value | ||
} # Nothing to do if user did not provide snakemake_param_name | ||
|
||
# In any case, required parameters must be returned | ||
base::return(wrapper_extra) | ||
} | ||
|
||
# Setting up multithreading if required | ||
parallel <- FALSE | ||
if (snakemake@threads > 1) { | ||
BiocParallel::register( | ||
BPPARAM = BiocParallel::MulticoreParam(snakemake@threads) | ||
) | ||
parallel <- TRUE | ||
} | ||
|
||
# Load DESeq2 dataset | ||
dds_path <- base::as.character(x = snakemake@input[["dds"]]) | ||
dds <- base::readRDS(file = dds_path) | ||
base::message("Libraries and dataset loaded") | ||
|
||
# Build extra parameters for DESeq2 | ||
extra_deseq2 <- add_extra( | ||
wrapper_extra = "object = dds, test = 'Wald', parallel = parallel", | ||
snakemake_param_name = "deseq_extra" | ||
) | ||
|
||
deseq2_cmd <- base::paste0( | ||
"DESeq2::DESeq(", extra_deseq2, ")" | ||
) | ||
base::message("DESeq2 command line:") | ||
base::message(deseq2_cmd) | ||
|
||
# Running DESeq2::DESeq for wald test result | ||
wald <- base::eval(base::parse(text = deseq2_cmd)) | ||
|
||
|
||
# The rest of the script is here to save part or complete | ||
# list of results in RDS or plain text (TSV) formats. | ||
|
||
|
||
# Save main result on user request (RDS) | ||
# This includes counts, wald tests for all levels | ||
# assays, design, etc. | ||
if ("wald_rds" %in% base::names(x = snakemake@output)) { | ||
output_rds <- base::as.character(x = snakemake@output[["wald_rds"]]) | ||
base::saveRDS(obj = wald, file = output_rds) | ||
base::message("Wald test saved as RDS file") | ||
} | ||
|
||
# Saving normalized counts on demand | ||
table <- counts(wald) | ||
|
||
# TSV-formatted count table | ||
if ("normalized_counts_table" %in% base::names(snakemake@output)) { | ||
output_table <- base::as.character( | ||
x = snakemake@output[["normalized_counts_table"]] | ||
) | ||
utils::write.table(x = table, file = output_table, sep = "\t", quote = FALSE) | ||
base::message("Normalized counts saved as TSV") | ||
} | ||
|
||
# RDS-formated count object with many information, | ||
# including counts, assays, etc. | ||
if ("normalized_counts_rds" %in% base::names(snakemake@output)) { | ||
output_rds <- base::as.character( | ||
x = snakemake@output[["normalized_counts_rds"]] | ||
) | ||
base::saveRDS(obj = table, file = output_rds) | ||
base::message("Normalized counts saved as RDS") | ||
} | ||
|
||
# On user request: save all results as TSV in a directory. | ||
# User can later access the directory content, e.g. with | ||
# a snakemake checkpoint-rule. | ||
if ("deseq2_result_dir" %in% base::names(snakemake@output)) { | ||
# Acquire list of available results in DESeqDataSet | ||
wald_results_names <- DESeq2::resultsNames(object = wald) | ||
|
||
# Recovering extra parameters for TSV tables | ||
# The variable `result_name` is built below in `for` loop. | ||
results_extra <- add_extra( | ||
wrapper_extra = "object = wald, name = result_name, parallel = parallel", | ||
snakemake_param_name = "results_extra" | ||
) | ||
|
||
# DESeq2 result dir will contain all results available in the Wald object | ||
output_prefix <- snakemake@output[["deseq2_result_dir"]] | ||
if (! base::file.exists(output_prefix)) { | ||
base::dir.create(path = output_prefix, recursive = TRUE) | ||
} | ||
|
||
# Building command lines for both wald results and fc schinkage | ||
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")") | ||
base::message("Command line used for TSV results creation:") | ||
base::message(results_cmd) | ||
|
||
shrink_extra <- add_extra( | ||
"dds = wald, res = results_frame, contrast = contrast, parallel = parallel, type = 'ashr'", | ||
"shrink_extra" | ||
) | ||
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")") | ||
base::message("Command line used for log(FC) shrinkage:") | ||
base::message(shrink_cmd) | ||
|
||
# For each available comparison in the wald-dds object | ||
for (result_name in wald_results_names) { | ||
# Building table | ||
base::message(base::paste("Saving results for", result_name)) | ||
results_frame <- base::eval(base::parse(text = results_cmd)) | ||
shrink_frame <- base::eval(base::parse(text = shrink_cmd)) | ||
results_frame$log2FoldChange <- shrink_frame$log2FoldChange | ||
|
||
results_path <- base::file.path( | ||
output_prefix, | ||
base::paste0(result_name, ".tsv") | ||
) | ||
|
||
# Saving table | ||
utils::write.table( | ||
x = results_frame, | ||
file = results_path, | ||
quote = FALSE, | ||
sep = "\t", | ||
row.names = TRUE | ||
) | ||
} | ||
} | ||
|
||
|
||
# If user provides contrasts, then a precise result | ||
# can be extracted from DESeq2 object. | ||
if ("wald_tsv" %in% base::names(x = snakemake@output)) { | ||
if ("contrast" %in% base::names(x = snakemake@params)) { | ||
contrast_length <- base::length(x = snakemake@params[["contrast"]]) | ||
|
||
results_extra <- "object=wald, parallel = parallel" | ||
contrast <- NULL | ||
|
||
if (contrast_length == 1) { | ||
# Case user provided a result name in the `contrast` parameter | ||
contrast <- base::as.character(x = snakemake@params[["contrast"]]) | ||
contrast <- base::paste0("name='", contrast[1], "'") | ||
|
||
} else if (contrast_length == 2) { | ||
# Case user provided both tested and reference level | ||
# In that order! Order matters. | ||
contrast <- sapply( | ||
snakemake@params[["contrast"]], | ||
function(extra) base::as.character(x = extra) | ||
) | ||
contrast <- base::paste0( | ||
"contrast=list('", contrast[1], "', '", contrast[2], "')" | ||
) | ||
|
||
} else if (contrast_length == 3) { | ||
# Case user provided both tested and reference level, | ||
# and studied factor. | ||
contrast <- sapply( | ||
snakemake@params[["contrast"]], | ||
function(extra) base::as.character(x = extra) | ||
) | ||
contrast <- base::paste0( | ||
"contrast=c('", | ||
contrast[1], | ||
"', '", | ||
contrast[2], | ||
"', '", | ||
contrast[3], | ||
"')" | ||
) | ||
|
||
# Finally saving results as contrast has been | ||
# built from user input. | ||
results_extra <- base::paste(results_extra, contrast, sep = ", ") | ||
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")") | ||
base::message("Result extraction command: ", results_cmd) | ||
|
||
shrink_extra <- add_extra( | ||
"dds = wald, res = results_frame, contrast = contrast[1], parallel = parallel, type = 'ashr'", | ||
"shrink_extra" | ||
) | ||
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")") | ||
base::message("Command line used for log(FC) shrinkage:") | ||
base::message(shrink_cmd) | ||
|
||
|
||
results_frame <- base::eval(base::parse(text = results_cmd)) | ||
shrink_frame <- base::eval(base::parse(text = shrink_cmd)) | ||
results_frame$log2FoldChange <- shrink_frame$log2FoldChange | ||
|
||
# Saving table | ||
utils::write.table( | ||
x = results_frame, | ||
file = base::as.character(x = snakemake@output[["wald_tsv"]]), | ||
quote = FALSE, | ||
sep = "\t", | ||
row.names = TRUE | ||
) | ||
} | ||
} else { | ||
base::stop( | ||
"No contrast provided. ", | ||
"In absence of contrast, it is not possible ", | ||
"to guess the expected result name.", | ||
) | ||
} | ||
} | ||
|
||
# Proper syntax to close the connection for the log file | ||
# but could be optional for Snakemake wrapper | ||
base::sink(type = "message") | ||
base::sink() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters