Skip to content

Commit

Permalink
feat: Deseq2 wald (#1428)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

This PR adds DESeq2 wald test function and its very common outputs.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: tdayris <tdayris@gustaveroussy.fr>
Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: Felix Mölder <felix.moelder@uni-due.de>
Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
  • Loading branch information
9 people committed Jul 6, 2023
1 parent 86a08de commit e0d2831
Show file tree
Hide file tree
Showing 6 changed files with 311 additions and 0 deletions.
8 changes: 8 additions & 0 deletions bio/deseq2/wald/environment.yaml
@@ -0,0 +1,8 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- bioconductor-deseq2=1.38.0
- bioconductor-biocparallel=1.32.5
- r-ashr=2.2_54
25 changes: 25 additions & 0 deletions bio/deseq2/wald/meta.yaml
@@ -0,0 +1,25 @@
name: deseq2
description: |
Call differentially expressed genes with DESeq2
url: https://bioconductor.org/packages/3.16/bioc/html/DESeq2.html
authors:
- Thibault Dayris
input:
- dds: Path to RDS-formatted DESeq2-object
output:
- wald_rds: Optional path to wald test results (RDS formatted)
- wald_tsv: Optional path to wald test results (TSV formatted). Required optional parameter `contrast` (see below)
- deseq2_result_dir: Optional path to a directory that shall contain all DESeq2 results for each comparison (each file is TSV formatted)
- normalized_counts_table: Optional path to normalized counts (TSV formatted)
- normalized_counts_rds: Optional path to normalized counts (RDS formatted)
params:
- deseq_extra: Optional parameters provided to the function `DESeq()`
- schrink_extra: Optional parameters provided to the function `lfSchrink()`
- results_extra: Optional parameters provided to the function `result()`
- contrast: List of characters. See notes below.
note: |
* About `contrast` parameter. DESeq2 defines the following: this argument specifies what comparison to extract from the object to build a results table. one of either:
* a character list with exactly 3 elements: (1) the name of a factor in the design formula, (2) the name of the numerator level for the fold change (aka, tested condition), and (3) the name of the denominator level for the fold change (aka, reference condition).
* a character list with exactly 2 elements: (1) the list of level names of the fold changes for the numerator (aka, tested condition), and (2) the list of level names of the fold changes for the denominator (aka, reference condition). these names should be elements of resultsNames(object). More general case, can be to combine interaction terms and main effects
* a numeric contrast vector with one element for each element in `resultsNames(object)`. Most general case
In case `contrast` optional parameter is not a list of strings of length 3 as described above, then it is *not* possible to save a single `wald_tsv` result. Please use `deseq2_result_dir` output to access you result in the given directory in such case.
19 changes: 19 additions & 0 deletions bio/deseq2/wald/test/Snakefile
@@ -0,0 +1,19 @@
rule test_deseq2_wald:
input:
dds="dds.RDS",
output:
wald_rds="wald.RDS",
wald_tsv="dge.tsv",
deseq2_result_dir=directory("deseq_results"),
normalized_counts_table="counts.tsv",
normalized_counts_rds="counts.RDS",
params:
deseq_extra="",
shrink_extra="",
results_extra="",
contrast=["condition", "A", "B"],
threads: 1
log:
"logs/deseq2.log",
wrapper:
"master/bio/deseq2/wald"
Binary file added bio/deseq2/wald/test/dds.RDS
Binary file not shown.
251 changes: 251 additions & 0 deletions bio/deseq2/wald/wrapper.R
@@ -0,0 +1,251 @@
# This script takes a deseq2 dataset object, performs
# a DESeq2 wald test, and saves results as requested by user

# __author__ = "Thibault Dayris"
# __copyright__ = "Copyright 2023, Thibault Dayris"
# __email__ = "thibault.dayris@gustaveroussy.fr"
# __license__ = "MIT"

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log_file <- base::file(description = snakemake@log[[1]], open = "wt")
base::sink(file = log_file)
base::sink(file = log_file, type = "message")

# Loading libraries (order matters)
base::library(package = "BiocParallel", character.only = TRUE)
base::library(package = "SummarizedExperiment", character.only = TRUE)
base::library(package = "DESeq2", character.only = TRUE)
base::library(package = "ashr", character.only = TRUE)

# Function to handle optional user-defined parameters
# and still follow R syntax
add_extra <- function(wrapper_extra, snakemake_param_name) {
if (snakemake_param_name %in% base::names(snakemake@params)) {
# Case user provides snakemake_param_name in snakemake rule
user_param <- snakemake@params[[snakemake_param_name]]

param_is_empty <- user_param == ""
param_is_character <- inherits(x = user_param, what = "charcter")
if ((! param_is_empty) && (param_is_character)) {
# Case user do not provide an empty string
# (R does not like trailing commas at the end
# of a function call)
wrapper_extra <- base::paste(
wrapper_extra,
user_param,
sep = ", "
)
} # Nothing to do if user provides an empty / NULL parameter value
} # Nothing to do if user did not provide snakemake_param_name

# In any case, required parameters must be returned
base::return(wrapper_extra)
}

# Setting up multithreading if required
parallel <- FALSE
if (snakemake@threads > 1) {
BiocParallel::register(
BPPARAM = BiocParallel::MulticoreParam(snakemake@threads)
)
parallel <- TRUE
}

# Load DESeq2 dataset
dds_path <- base::as.character(x = snakemake@input[["dds"]])
dds <- base::readRDS(file = dds_path)
base::message("Libraries and dataset loaded")

# Build extra parameters for DESeq2
extra_deseq2 <- add_extra(
wrapper_extra = "object = dds, test = 'Wald', parallel = parallel",
snakemake_param_name = "deseq_extra"
)

deseq2_cmd <- base::paste0(
"DESeq2::DESeq(", extra_deseq2, ")"
)
base::message("DESeq2 command line:")
base::message(deseq2_cmd)

# Running DESeq2::DESeq for wald test result
wald <- base::eval(base::parse(text = deseq2_cmd))


# The rest of the script is here to save part or complete
# list of results in RDS or plain text (TSV) formats.


# Save main result on user request (RDS)
# This includes counts, wald tests for all levels
# assays, design, etc.
if ("wald_rds" %in% base::names(x = snakemake@output)) {
output_rds <- base::as.character(x = snakemake@output[["wald_rds"]])
base::saveRDS(obj = wald, file = output_rds)
base::message("Wald test saved as RDS file")
}

# Saving normalized counts on demand
table <- counts(wald)

# TSV-formatted count table
if ("normalized_counts_table" %in% base::names(snakemake@output)) {
output_table <- base::as.character(
x = snakemake@output[["normalized_counts_table"]]
)
utils::write.table(x = table, file = output_table, sep = "\t", quote = FALSE)
base::message("Normalized counts saved as TSV")
}

# RDS-formated count object with many information,
# including counts, assays, etc.
if ("normalized_counts_rds" %in% base::names(snakemake@output)) {
output_rds <- base::as.character(
x = snakemake@output[["normalized_counts_rds"]]
)
base::saveRDS(obj = table, file = output_rds)
base::message("Normalized counts saved as RDS")
}

# On user request: save all results as TSV in a directory.
# User can later access the directory content, e.g. with
# a snakemake checkpoint-rule.
if ("deseq2_result_dir" %in% base::names(snakemake@output)) {
# Acquire list of available results in DESeqDataSet
wald_results_names <- DESeq2::resultsNames(object = wald)

# Recovering extra parameters for TSV tables
# The variable `result_name` is built below in `for` loop.
results_extra <- add_extra(
wrapper_extra = "object = wald, name = result_name, parallel = parallel",
snakemake_param_name = "results_extra"
)

# DESeq2 result dir will contain all results available in the Wald object
output_prefix <- snakemake@output[["deseq2_result_dir"]]
if (! base::file.exists(output_prefix)) {
base::dir.create(path = output_prefix, recursive = TRUE)
}

# Building command lines for both wald results and fc schinkage
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")")
base::message("Command line used for TSV results creation:")
base::message(results_cmd)

shrink_extra <- add_extra(
"dds = wald, res = results_frame, contrast = contrast, parallel = parallel, type = 'ashr'",
"shrink_extra"
)
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")")
base::message("Command line used for log(FC) shrinkage:")
base::message(shrink_cmd)

# For each available comparison in the wald-dds object
for (result_name in wald_results_names) {
# Building table
base::message(base::paste("Saving results for", result_name))
results_frame <- base::eval(base::parse(text = results_cmd))
shrink_frame <- base::eval(base::parse(text = shrink_cmd))
results_frame$log2FoldChange <- shrink_frame$log2FoldChange

results_path <- base::file.path(
output_prefix,
base::paste0(result_name, ".tsv")
)

# Saving table
utils::write.table(
x = results_frame,
file = results_path,
quote = FALSE,
sep = "\t",
row.names = TRUE
)
}
}


# If user provides contrasts, then a precise result
# can be extracted from DESeq2 object.
if ("wald_tsv" %in% base::names(x = snakemake@output)) {
if ("contrast" %in% base::names(x = snakemake@params)) {
contrast_length <- base::length(x = snakemake@params[["contrast"]])

results_extra <- "object=wald, parallel = parallel"
contrast <- NULL

if (contrast_length == 1) {
# Case user provided a result name in the `contrast` parameter
contrast <- base::as.character(x = snakemake@params[["contrast"]])
contrast <- base::paste0("name='", contrast[1], "'")

} else if (contrast_length == 2) {
# Case user provided both tested and reference level
# In that order! Order matters.
contrast <- sapply(
snakemake@params[["contrast"]],
function(extra) base::as.character(x = extra)
)
contrast <- base::paste0(
"contrast=list('", contrast[1], "', '", contrast[2], "')"
)

} else if (contrast_length == 3) {
# Case user provided both tested and reference level,
# and studied factor.
contrast <- sapply(
snakemake@params[["contrast"]],
function(extra) base::as.character(x = extra)
)
contrast <- base::paste0(
"contrast=c('",
contrast[1],
"', '",
contrast[2],
"', '",
contrast[3],
"')"
)

# Finally saving results as contrast has been
# built from user input.
results_extra <- base::paste(results_extra, contrast, sep = ", ")
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")")
base::message("Result extraction command: ", results_cmd)

shrink_extra <- add_extra(
"dds = wald, res = results_frame, contrast = contrast[1], parallel = parallel, type = 'ashr'",
"shrink_extra"
)
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")")
base::message("Command line used for log(FC) shrinkage:")
base::message(shrink_cmd)


results_frame <- base::eval(base::parse(text = results_cmd))
shrink_frame <- base::eval(base::parse(text = shrink_cmd))
results_frame$log2FoldChange <- shrink_frame$log2FoldChange

# Saving table
utils::write.table(
x = results_frame,
file = base::as.character(x = snakemake@output[["wald_tsv"]]),
quote = FALSE,
sep = "\t",
row.names = TRUE
)
}
} else {
base::stop(
"No contrast provided. ",
"In absence of contrast, it is not possible ",
"to guess the expected result name.",
)
}
}

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
base::sink(type = "message")
base::sink()
8 changes: 8 additions & 0 deletions test.py
Expand Up @@ -1079,6 +1079,14 @@ def test_deseq2_deseqdataset():
)


@skip_if_not_modified
def test_deseq2_wald():
run(
"bio/deseq2/wald",
["snakemake", "--cores", "1", "--use-conda", "dge.tsv"]
)


@skip_if_not_modified
def test_arriba_star_meta():
run(
Expand Down

0 comments on commit e0d2831

Please sign in to comment.