Skip to content

Commit

Permalink
fix: Nonpareil tweaks (#1449)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

<!-- Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
fgvieira committed Jun 29, 2023
1 parent da81a5a commit b32b341
Show file tree
Hide file tree
Showing 10 changed files with 233 additions and 36 deletions.
1 change: 1 addition & 0 deletions bio/nonpareil/infer/environment.yaml
Expand Up @@ -4,3 +4,4 @@ channels:
- nodefaults
dependencies:
- nonpareil =3.4.1
- snakemake-wrapper-utils =0.5.3
4 changes: 3 additions & 1 deletion bio/nonpareil/infer/meta.yaml
@@ -1,4 +1,4 @@
name: nonpareil
name: nonpareil infer
description: Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve “nearly complete coverage”.
url: https://nonpareil.readthedocs.io/en/latest/
authors:
Expand All @@ -13,3 +13,5 @@ output:
params:
- alg: nonpareil algorithm, either `kmer` or `alignment` (mandatory).
- extra: additional program arguments
notes: |
* For a PDF version of the manual, see https://nonpareil.readthedocs.io/_/downloads/en/latest/pdf/
3 changes: 3 additions & 0 deletions bio/nonpareil/infer/test/Snakefile
Expand Up @@ -11,5 +11,8 @@ rule nonpareil:
params:
alg="kmer",
extra="-X 1 -k 3 -F",
threads: 2
resources:
mem_mb=50,
wrapper:
"master/bio/nonpareil/infer"
5 changes: 5 additions & 0 deletions bio/nonpareil/infer/wrapper.py
Expand Up @@ -5,9 +5,12 @@
from os import path
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mem_mb = get_mem(snakemake, out_unit="MiB")

uncomp = ""
in_name, in_ext = path.splitext(snakemake.input[0])
Expand Down Expand Up @@ -52,6 +55,8 @@

shell(
"nonpareil"
" -t {snakemake.threads}"
" -R {mem_mb}"
" -T {snakemake.params.alg}"
" -s {in_uncomp}"
" -f {in_format}"
Expand Down
5 changes: 3 additions & 2 deletions bio/nonpareil/plot/meta.yaml
@@ -1,4 +1,4 @@
name: nonpareil
name: nonpareil plot
description: Plot Nonpareil results.
url: https://nonpareil.readthedocs.io/en/latest/
authors:
Expand All @@ -9,7 +9,8 @@ output:
- PDF file with plot
params:
- label: Plot title.
- col: Color of the curve.
- labels: Curve labels.
- col: Curve colors.
- enforce_consistency: Fails verbosely on insufficient data, otherwise it warns about the inconsistencies and attempts the estimations.
- star: Objective coverage in percentage (i.e., coverage value considered near-complete).
- correction_factor: Apply overlap-dependent correction factor, otherwise redundancy is assumed to equal coverage.
Expand Down
34 changes: 34 additions & 0 deletions bio/nonpareil/plot/test/Snakefile
Expand Up @@ -3,6 +3,7 @@ rule test_nonpareil_plot:
npo="{sample}.npo",
output:
pdf="results/{sample}.pdf",
model="results/{sample}.RData",
threads: 1
log:
"logs/{sample}.log",
Expand All @@ -16,3 +17,36 @@ rule test_nonpareil_plot:
skip_model=False,
wrapper:
"master/bio/nonpareil/plot"


use rule test_nonpareil_plot as test_nonpareil_plot_multiple with:
input:
npo=["a.npo", "b.npo"],
output:
pdf="results/samples.pdf",
model="results/samples.RData",
log:
"logs/samples.log",
params:
label="Plot of 2 samples",
labels="Model A,Model B",
col="blue,red",
enforce_consistency=True,
star=95,
correction_factor=True,


use rule test_nonpareil_plot as test_nonpareil_plot_nomodel with:
output:
pdf="results/{sample}.nomodel.pdf",
model="results/{sample}.RData",
log:
"logs/{sample}.nomodel.log",
params:
label="Plot",
col="blue",
enforce_consistency=True,
star=95,
correction_factor=True,
weights_exp="-1.1,-1.2,-0.9,-1.3,-1",
skip_model=True,
70 changes: 59 additions & 11 deletions bio/nonpareil/plot/test/a.npo
@@ -1,16 +1,64 @@
# @impl: Nonpareil
# @ksize: 3
# @ksize: 24
# @version: 3.40
# @L: 7.000
# @AL: 5.000
# @R: 15
# @L: 53.084
# @AL: 30.084
# @R: 384631862
# @overlap: 50.00
# @divide: 0.70
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.117649 0.00000 0.00000 0.00000 0.00000 0.00000
0.168070 0.16699 0.37297 0.00000 0.00000 0.00000
0.240100 0.25879 0.43797 0.00000 0.00000 1.00000
0.343000 0.34961 0.47685 0.00000 0.00000 1.00000
0.490000 0.47852 0.49954 0.00000 0.00000 1.00000
0.700000 0.70898 0.45423 0.00000 1.00000 1.00000
1.000000 1.00000 0.00000 1.00000 1.00000 1.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000001 0.00000 0.00000 0.00000 0.00000 0.00000
0.000001 0.00000 0.00000 0.00000 0.00000 0.00000
0.000001 0.00000 0.00000 0.00000 0.00000 0.00000
0.000002 0.00000 0.00000 0.00000 0.00000 0.00000
0.000003 0.00000 0.00000 0.00000 0.00000 0.00000
0.000004 0.00098 0.03123 0.00000 0.00000 0.00000
0.000005 0.00000 0.00000 0.00000 0.00000 0.00000
0.000008 0.00098 0.03123 0.00000 0.00000 0.00000
0.000011 0.00000 0.00000 0.00000 0.00000 0.00000
0.000016 0.00000 0.00000 0.00000 0.00000 0.00000
0.000023 0.00000 0.00000 0.00000 0.00000 0.00000
0.000032 0.00098 0.03123 0.00000 0.00000 0.00000
0.000046 0.00049 0.01562 0.00000 0.00000 0.00000
0.000066 0.00464 0.06473 0.00000 0.00000 0.00000
0.000094 0.00602 0.07285 0.00000 0.00000 0.00000
0.000134 0.00985 0.08393 0.00000 0.00000 0.00000
0.000192 0.00996 0.08451 0.00000 0.00000 0.00000
0.000274 0.01136 0.07021 0.00000 0.00000 0.00000
0.000391 0.01637 0.07483 0.00000 0.00000 0.00000
0.000559 0.02437 0.07024 0.00000 0.00000 0.00000
0.000798 0.02880 0.06635 0.00000 0.00000 0.00000
0.001140 0.03996 0.06186 0.00000 0.00000 0.07692
0.001628 0.05527 0.05706 0.00000 0.05263 0.08333
0.002326 0.07618 0.05457 0.04000 0.07143 0.11111
0.003323 0.09893 0.05188 0.06250 0.09091 0.13158
0.004748 0.13256 0.05040 0.09375 0.09434 0.16667
0.006782 0.17234 0.04685 0.14085 0.14545 0.20408
0.009689 0.23200 0.04199 0.20430 0.23077 0.25833
0.013841 0.30215 0.03872 0.27586 0.29577 0.32857
0.019773 0.38727 0.03490 0.36493 0.38674 0.41089
0.028248 0.48492 0.02827 0.46691 0.49237 0.50181
0.040354 0.58935 0.02397 0.57320 0.57339 0.60723
0.057648 0.69105 0.01980 0.67780 0.70447 0.70483
0.082354 0.78459 0.01365 0.77549 0.77895 0.79416
0.117649 0.85720 0.00982 0.85060 0.85487 0.86336
0.168070 0.90993 0.00672 0.90558 0.90636 0.91425
0.240100 0.94493 0.00425 0.94198 0.94693 0.94781
0.343000 0.96615 0.00295 0.96412 0.96537 0.96817
0.490000 0.97872 0.00187 0.97754 0.97856 0.98000
0.700000 0.98602 0.00111 0.98530 0.98574 0.98675
1.000000 0.99024 0.00061 0.98985 0.99016 0.99066
42 changes: 42 additions & 0 deletions bio/nonpareil/plot/test/b.npo
@@ -0,0 +1,42 @@
# @impl: Nonpareil
# @ksize: 24
# @version: 3.40
# @L: 57.330
# @AL: 34.330
# @R: 164338
# @overlap: 50.00
# @divide: 0.70
0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
0.000011 0.00000 0.00000 0.00000 0.00000 0.00000
0.000016 0.00000 0.00000 0.00000 0.00000 0.00000
0.000023 0.00000 0.00000 0.00000 0.00000 0.00000
0.000032 0.00000 0.00000 0.00000 0.00000 0.00000
0.000046 0.00000 0.00000 0.00000 0.00000 0.00000
0.000066 0.00000 0.00000 0.00000 0.00000 0.00000
0.000094 0.00000 0.00000 0.00000 0.00000 0.00000
0.000134 0.00000 0.00000 0.00000 0.00000 0.00000
0.000192 0.00000 0.00000 0.00000 0.00000 0.00000
0.000274 0.00000 0.00000 0.00000 0.00000 0.00000
0.000391 0.00049 0.01562 0.00000 0.00000 0.00000
0.000559 0.00022 0.00491 0.00000 0.00000 0.00000
0.000798 0.00038 0.00617 0.00000 0.00000 0.00000
0.001140 0.00005 0.00164 0.00000 0.00000 0.00000
0.001628 0.00067 0.00688 0.00000 0.00000 0.00000
0.002326 0.00060 0.00499 0.00000 0.00000 0.00000
0.003323 0.00106 0.00572 0.00000 0.00000 0.00000
0.004748 0.00115 0.00506 0.00000 0.00000 0.00000
0.006782 0.00170 0.00506 0.00000 0.00000 0.00000
0.009689 0.00214 0.00466 0.00000 0.00000 0.00000
0.013841 0.00290 0.00486 0.00000 0.00000 0.00690
0.019773 0.00341 0.00402 0.00000 0.00000 0.00529
0.028248 0.00449 0.00400 0.00000 0.00326 0.00714
0.040354 0.00530 0.00366 0.00249 0.00493 0.00754
0.057648 0.00654 0.00330 0.00362 0.00372 0.00879
0.082354 0.00783 0.00309 0.00591 0.00597 0.00986
0.117649 0.00938 0.00270 0.00759 0.00880 0.01112
0.168070 0.01108 0.00241 0.00931 0.00958 0.01273
0.240100 0.01309 0.00215 0.01164 0.01259 0.01455
0.343000 0.01552 0.00186 0.01424 0.01647 0.01681
0.490000 0.01852 0.00159 0.01738 0.01796 0.01967
0.700000 0.02159 0.00122 0.02079 0.02140 0.02243
1.000000 0.02479 0.00072 0.02438 0.02458 0.02529
100 changes: 79 additions & 21 deletions bio/nonpareil/plot/wrapper.R
Expand Up @@ -17,35 +17,93 @@ base::library(package = "Nonpareil", character.only = TRUE)
base::message("Libraries loaded")


# Set input and output files
in_files = snakemake@input[["npo"]]
out_pdf = snakemake@output[[1]]
base::message("Input files: ")
base::print(in_files)
base::message("Saving plot to file: ")
base::print(out_pdf)


# Set parameters
params <- list("label" = ifelse("label" %in% base::names(snakemake@params), snakemake@params[["label"]], NA),
"col" = ifelse("col" %in% base::names(snakemake@params), snakemake@params[["col"]], NA),
"enforce_consistency" = ifelse("enforce_consistency" %in% base::names(snakemake@params), as.logical(snakemake@params[["enforce_consistency"]]), FALSE),
"star" = ifelse("star" %in% base::names(snakemake@params), snakemake@params[["star"]], 95),
"correction_factor" = ifelse("correction_factor" %in% base::names(snakemake@params), as.logical(snakemake@params[["correction_factor"]]), FALSE),
"weights_exp" = NA,
"skip_model" = ifelse("skip_model" %in% base::names(snakemake@params), as.logical(snakemake@params[["skip_model"]]), FALSE)
)
"labels" = NA,
"col" = NA,
"enforce_consistency" = ifelse("enforce_consistency" %in% base::names(snakemake@params), as.logical(snakemake@params[["enforce_consistency"]]), FALSE),
"star" = ifelse("star" %in% base::names(snakemake@params), snakemake@params[["star"]], 95),
"correction_factor" = ifelse("correction_factor" %in% base::names(snakemake@params), as.logical(snakemake@params[["correction_factor"]]), FALSE),
"weights_exp" = NA,
"skip_model" = ifelse("skip_model" %in% base::names(snakemake@params), as.logical(snakemake@params[["skip_model"]]), FALSE)
)

# Not sure why, by using "ifelse" only keeps the first element of the vector
if ("labels" %in% base::names(snakemake@params)) {
params[["labels"]] = unlist(strsplit(snakemake@params[["labels"]], ","))
}

if ("col" %in% base::names(snakemake@params)) {
params[["col"]] = unlist(strsplit(snakemake@params[["col"]], ","))
}

if ("weights_exp" %in% base::names(snakemake@params)) {
params[["weights_exp"]] = as.numeric(unlist(strsplit(snakemake@params[["weights_exp"]], ",")))
}


base::message("Options provided:")
base::print(params)


pdf(snakemake@output[[1]])
Nonpareil.curve(file = snakemake@input[[1]],
label = params[["label"]],
col = params[["col"]],
enforce.consistency = params[["enforce_consistency"]],
star = params[["star"]],
correction.factor = params[["correction_factor"]],
weights.exp = params[["weights_exp"]],
skip.model = params[["skip_model"]]
)
utils::str(params)


# Infer model
pdf(out_pdf)
curves <- Nonpareil.curve.batch(in_files,
label = params[["label"]],
labels = params[["labels"]],
col = params[["col"]],
enforce.consistency = params[["enforce_consistency"]],
star = params[["star"]],
correction.factor = params[["correction_factor"]],
weights.exp = params[["weights_exp"]],
skip.model = params[["skip_model"]],
plot = FALSE
)


# Get stats
stats <- summary(curves)
# Fix names
colnames(stats) <- c("Redundancy", "Avg. coverage", "Seq. effort", "Model correlation", "Required seq. effort", "Diversity")


# If model not infered, set its values to NA
stats[,4] <- sapply(stats[,4], function(x){if(length(x) == 0){NA} else {x}})
stats[,5] <- sapply(stats[,5], function(x){if(length(x) == 0){NA} else {x}})


# Convert to Gb
stats[,3] <- stats[,3] / 1e9
stats[,5] <- stats[,5] / 1e9
# Round
stats <- round(stats, digits = 2)
# Print stats to log
base::print(stats)


# Save plot
plot(curves, legend.opts = FALSE)


# Add legend
legend("bottomright", legend = paste0(paste(colnames(stats), t(stats), sep=": "), c("",""," Gb",""," Gb","")), cex = 0.5)
if (length(in_files) > 1) {
Nonpareil.legend(curves, "topleft", cex = 0.5)
}


# Save model
if ("model" %in% base::names(snakemake@output)) {
save(curves, file=snakemake@output[["model"]])
}


base::message("Nonpareil plot saved")
Expand Down
5 changes: 4 additions & 1 deletion test.py
Expand Up @@ -170,7 +170,10 @@ def test_nonpareil_plot():
"--use-conda",
"-F",
"results/a.pdf",
],
"results/samples.pdf",
# Test disabled due to bug in nonpareil (#62)
# "results/a.nomodel.pdf",
]
)


Expand Down

0 comments on commit b32b341

Please sign in to comment.