Skip to content

Commit

Permalink
feat: add wrapper to plot nonpareil results (#1398)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

<!-- Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
fgvieira committed Jun 7, 2023
1 parent f44c6a0 commit 1b31bd5
Show file tree
Hide file tree
Showing 17 changed files with 133 additions and 3 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Expand Up @@ -3,4 +3,5 @@ __pycache__
.snakemake
.DS_Store
.cache
.idea
.idea
*~
File renamed without changes.
File renamed without changes.
Expand Up @@ -12,4 +12,4 @@ rule nonpareil:
alg="kmer",
extra="-X 1 -k 3 -F",
wrapper:
"master/bio/nonpareil"
"master/bio/nonpareil/infer"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions bio/nonpareil/plot/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- nonpareil =3.4.1
17 changes: 17 additions & 0 deletions bio/nonpareil/plot/meta.yaml
@@ -0,0 +1,17 @@
name: nonpareil
description: Plot Nonpareil results.
url: https://nonpareil.readthedocs.io/en/latest/
authors:
- Filipe G. Vieira
input:
- NPO file
output:
- PDF file with plot
params:
- label: Plot title.
- col: Color of the curve.
- enforce_consistency: Fails verbosely on insufficient data, otherwise it warns about the inconsistencies and attempts the estimations.
- star: Objective coverage in percentage (i.e., coverage value considered near-complete).
- correction_factor: Apply overlap-dependent correction factor, otherwise redundancy is assumed to equal coverage.
- weights_exp: Vector of values to be tested as exponent of the weights distribution.
- skip_model: Skip model estimation.
18 changes: 18 additions & 0 deletions bio/nonpareil/plot/test/Snakefile
@@ -0,0 +1,18 @@
rule test_nonpareil_plot:
input:
npo="{sample}.npo",
output:
pdf="results/{sample}.pdf",
threads: 1
log:
"logs/{sample}.log",
params:
label="Plot",
col="blue",
enforce_consistency=True,
star=95,
correction_factor=True,
weights_exp="-1.1,-1.2,-0.9,-1.3,-1",
skip_model=False,
wrapper:
"master/bio/nonpareil/plot"
16 changes: 16 additions & 0 deletions bio/nonpareil/plot/test/a.npo
@@ -0,0 +1,16 @@
# @impl: Nonpareil
# @ksize: 3
# @version: 3.40
# @L: 7.000
# @AL: 5.000
# @R: 15
# @overlap: 50.00
# @divide: 0.70
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.117649 0.00000 0.00000 0.00000 0.00000 0.00000
0.168070 0.16699 0.37297 0.00000 0.00000 0.00000
0.240100 0.25879 0.43797 0.00000 0.00000 1.00000
0.343000 0.34961 0.47685 0.00000 0.00000 1.00000
0.490000 0.47852 0.49954 0.00000 0.00000 1.00000
0.700000 0.70898 0.45423 0.00000 1.00000 1.00000
1.000000 1.00000 0.00000 1.00000 1.00000 1.00000
58 changes: 58 additions & 0 deletions bio/nonpareil/plot/wrapper.R
@@ -0,0 +1,58 @@
# __author__ = "Filipe G. Vieira"
# __copyright__ = "Copyright 2023, Filipe G. Vieira"
# __license__ = "MIT"

# This script plots results (NPO file) from NonPareil


# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file <- file(snakemake@log[[1]], open = "wt")
base::sink(log.file)
base::sink(log.file, type = "message")


# Loading libraries (order matters)
base::library(package = "Nonpareil", character.only = TRUE)
base::message("Libraries loaded")


params <- list("label" = ifelse("label" %in% base::names(snakemake@params), snakemake@params[["label"]], NA),
"col" = ifelse("col" %in% base::names(snakemake@params), snakemake@params[["col"]], NA),
"enforce_consistency" = ifelse("enforce_consistency" %in% base::names(snakemake@params), as.logical(snakemake@params[["enforce_consistency"]]), FALSE),
"star" = ifelse("star" %in% base::names(snakemake@params), snakemake@params[["star"]], 95),
"correction_factor" = ifelse("correction_factor" %in% base::names(snakemake@params), as.logical(snakemake@params[["correction_factor"]]), FALSE),
"weights_exp" = NA,
"skip_model" = ifelse("skip_model" %in% base::names(snakemake@params), as.logical(snakemake@params[["skip_model"]]), FALSE)
)

# Not sure why, by using "ifelse" only keeps the first element of the vector
if ("weights_exp" %in% base::names(snakemake@params)) {
params[["weights_exp"]] = as.numeric(unlist(strsplit(snakemake@params[["weights_exp"]], ",")))
}


base::message("Options provided:")
base::print(params)


pdf(snakemake@output[[1]])
Nonpareil.curve(file = snakemake@input[[1]],
label = params[["label"]],
col = params[["col"]],
enforce.consistency = params[["enforce_consistency"]],
star = params[["star"]],
correction.factor = params[["correction_factor"]],
weights.exp = params[["weights_exp"]],
skip.model = params[["skip_model"]]
)


base::message("Nonpareil plot saved")
dev.off()


# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
base::sink(type = "message")
base::sink()
16 changes: 15 additions & 1 deletion test.py
Expand Up @@ -143,7 +143,7 @@ def run(wrapper, cmd, check_log=None):
@skip_if_not_modified
def test_nonpareil():
run(
"bio/nonpareil",
"bio/nonpareil/infer",
[
"snakemake",
"--cores",
Expand All @@ -159,6 +159,20 @@ def test_nonpareil():
]
)

@skip_if_not_modified
def test_nonpareil_plot():
run(
"bio/nonpareil/plot",
[
"snakemake",
"--cores",
"1",
"--use-conda",
"-F",
"results/a.pdf",
]
)

@skip_if_not_modified
def test_indelqual():
run(
Expand Down

0 comments on commit 1b31bd5

Please sign in to comment.