Skip to content

Commit

Permalink
feat: Merqury (#540)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

<!-- Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
fgvieira committed Mar 22, 2023
1 parent 9c4e667 commit f7914c4
Show file tree
Hide file tree
Showing 136 changed files with 238 additions and 0 deletions.
6 changes: 6 additions & 0 deletions bio/merqury/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- merqury =1.3
14 changes: 14 additions & 0 deletions bio/merqury/meta.yaml
@@ -0,0 +1,14 @@
name: Merqury
description: |
Evaluate genome assemblies with k-mers and more.
url: https://github.com/marbl/merqury
authors:
- Filipe G. Vieira
input:
- one on two assembly fasta file(s)
- meryl database
output:
- annotation quality files
notes: |
* `Merqury` does not return non-zero exit code when it fails, so always include (at least) one PNG file in your rule's output.
* `Merqury` does not allow for extra params.
67 changes: 67 additions & 0 deletions bio/merqury/test/Snakefile
@@ -0,0 +1,67 @@
rule run_merqury_haploid:
input:
fasta="hap1.fasta",
meryldb="meryldb",
output:
# meryldb output
filt="results/haploid/meryldb.filt",
hist="results/haploid/meryldb.hist",
hist_ploidy="results/haploid/meryldb.hist.ploidy",
# general output
completeness_stats="results/haploid/out.completeness.stats",
dist_only_hist="results/haploid/out.dist.only.hist",
qv="results/haploid/out.qv",
spectra_asm_hist="results/haploid/out.spectra_asm.hist",
spectra_asm_ln_png="results/haploid/out.spectra_asm.png",
# haplotype-specific output
fas1_only_bed="results/haploid/hap1.bed",
fas1_only_wig="results/haploid/hap1.wig",
fas1_only_hist="results/haploid/hap1.hist",
fas1_qv="results/haploid/hap1.qv",
fas1_spectra_cn_hist="results/haploid/hap1.spectra.hist",
fas1_spectra_cn_ln_png="results/haploid/hap1.spectra.png",
log:
std="logs/haploid.log",
spectra_cn="logs/haploid.spectra-cn.log",
threads: 1
wrapper:
"master/bio/merqury"


rule run_merqury_diploid:
input:
fasta=["hap1.fasta", "hap2.fasta"],
meryldb="meryldb",
output:
# meryldb output
filt="results/diploid/meryldb.filt",
hist="results/diploid/meryldb.hist",
hist_ploidy="results/diploid/meryldb.hist.ploidy",
# general output
completeness_stats="results/diploid/out.completeness.stats",
dist_only_hist="results/diploid/out.dist.only.hist",
only_hist="results/diploid/out.only.hist",
qv="results/diploid/out.qv",
spectra_asm_hist="results/diploid/out.spectra_asm.hist",
spectra_asm_ln_png="results/diploid/out.spectra_asm.png",
spectra_cn_hist="results/diploid/out.spectra_cn.hist",
spectra_cn_ln_png="results/diploid/out.spectra_cn.png",
# haplotype-specific output
fas1_only_bed="results/diploid/hap1.bed",
fas1_only_wig="results/diploid/hap1.wig",
fas1_only_hist="results/diploid/hap1.hist",
fas1_qv="results/diploid/hap1.qv",
fas1_spectra_cn_hist="results/diploid/hap1.spectra.hist",
fas1_spectra_cn_ln_png="results/diploid/hap1.spectra.png",
fas2_only_bed="results/diploid/hap2.bed",
fas2_only_wig="results/diploid/hap2.wig",
fas2_only_hist="results/diploid/hap2.hist",
fas2_qv="results/diploid/hap2.qv",
fas2_spectra_cn_hist="results/diploid/hap2.spectra.hist",
fas2_spectra_cn_ln_png="results/diploid/hap2.spectra.png",
log:
std="logs/diploid.log",
spectra_cn="logs/diploid.spectra-cn.log",
threads: 1
wrapper:
"master/bio/merqury"
2 changes: 2 additions & 0 deletions bio/merqury/test/hap1.fasta
@@ -0,0 +1,2 @@
>Sheila
GCTAGCTCAGAAAAAAAAAAGATGCGAGGCGTAGGCGATGCGATCGATCGATCTATAGGCTCGAGGCTAGGGCTAGCTGA
2 changes: 2 additions & 0 deletions bio/merqury/test/hap2.fasta
@@ -0,0 +1,2 @@
>Sheila
GCTAGCTCAGAAAAAAAAAAGATGCGAGGCGTAGGCGATGCGATCGATCGATCTATAGGCTCGAGGCTAGGGCTAGCTGA
Binary file added bio/merqury/test/meryldb/0x000000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x000111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x001111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x010111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x011111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x100111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x101111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x110111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111000.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111000.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111001.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111001.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111010.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111010.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111011.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111011.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111100.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111100.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111101.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111101.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111110.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111110.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111111.merylData
Binary file not shown.
Binary file added bio/merqury/test/meryldb/0x111111.merylIndex
Binary file not shown.
Binary file added bio/merqury/test/meryldb/merylIndex
Binary file not shown.
132 changes: 132 additions & 0 deletions bio/merqury/wrapper.py
@@ -0,0 +1,132 @@
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"


import os
import tempfile
from pathlib import Path
from snakemake.shell import shell

meryldb_parents = snakemake.input.get("meryldb_parents", "")
out_prefix = "out"
log_tmp = "__LOG__.tmp"


def save_output(out_prefix, ext, cwd, file):
if file is None:
return 0
src = f"{out_prefix}{ext}"
dest = cwd / file
shell("cat {src} > {dest}")


with tempfile.TemporaryDirectory() as tmpdir:
cwd = Path.cwd()
# Create symlinks for input files
for input in snakemake.input:
src = Path(input)
dst = Path(tmpdir) / input
src = Path(os.path.relpath(src.resolve(), dst.resolve().parent))
dst.symlink_to(src)
os.chdir(tmpdir)

shell(
"merqury.sh"
" {snakemake.input.meryldb}"
" {meryldb_parents}"
" {snakemake.input.fasta}"
" {out_prefix}"
" > {log_tmp} 2>&1"
)

### Saving LOG files
save_output(log_tmp, "", cwd, snakemake.log.get("std"))
for type in ["spectra_cn"]:
save_output(
f"logs/{out_prefix}",
"." + type.replace("_", "-") + ".log",
cwd,
snakemake.log.get(type),
)

### Saving OUTPUT files
# EXT: replace all "_" with "."
meryldb = Path(snakemake.input.meryldb.rstrip("/")).stem
for type in ["filt", "hist", "hist_ploidy"]:
save_output(
meryldb, "." + type.replace("_", "."), cwd, snakemake.output.get(type)
)

# EXT: replace last "_" with "."
for type in [
"completeness_stats",
"dist_only_hist",
"only_hist",
"qv",
"hapmers_count",
"hapmers_blob_png",
]:
save_output(
out_prefix,
"." + type[::-1].replace("_", ".", 1)[::-1],
cwd,
snakemake.output.get(type),
)

# EXT: replace first "_" with "-", and remaining with "."
for type in [
"spectra_asm_hist",
"spectra_asm_ln_png",
"spectra_asm_fl_png",
"spectra_asm_st_png",
"spectra_cn_hist",
"spectra_cn_ln_png",
"spectra_cn_fl_png",
"spectra_cn_st_png",
]:
save_output(
out_prefix,
"." + type.replace("_", ".").replace(".", "-", 1),
cwd,
snakemake.output.get(type),
)

input_fas = snakemake.input.fasta
if isinstance(input_fas, str):
input_fas = [input_fas]

for fas in range(1, len(input_fas) + 1):
prefix = Path(input_fas[fas - 1]).name.removesuffix(".fasta")

# EXT: remove everything until first "_" and replace last "_" with "."
for type in [f"fas{fas}_only_bed", f"fas{fas}_only_wig"]:
save_output(
prefix,
type[type.find("_") :][::-1].replace("_", ".", 1)[::-1],
cwd,
snakemake.output.get(type),
)

# EXT: remove everything until first "_" and replace all "_" with "."
for type in [f"fas{fas}_only_hist", f"fas{fas}_qv"]:
save_output(
f"{out_prefix}.{prefix}",
type[type.find("_") :].replace("_", "."),
cwd,
snakemake.output.get(type),
)

# EXT: remove everything until first "_", replace first "_" with "-", and remaining with "."
for type in [
f"fas{fas}_spectra_cn_hist",
f"fas{fas}_spectra_cn_ln_png",
f"fas{fas}_spectra_cn_fl_png",
f"fas{fas}_spectra_cn_st_png",
]:
save_output(
f"{out_prefix}.{prefix}",
"." + type[type.find("_") + 1 :].replace("_", ".").replace(".", "-", 1),
cwd,
snakemake.output.get(type),
)
15 changes: 15 additions & 0 deletions test.py
Expand Up @@ -495,6 +495,21 @@ def test_salsa2():
)


@skip_if_not_modified
def test_merqury_haploid():
run(
"bio/merqury",
["snakemake", "--cores", "1", "results/haploid/out.qv", "--use-conda", "-F"],
)

@skip_if_not_modified
def test_merqury_diploid():
run(
"bio/merqury",
["snakemake", "--cores", "1", "results/diploid/out.qv", "--use-conda", "-F"],
)


@skip_if_not_modified
def test_mashmap():
run(
Expand Down

0 comments on commit f7914c4

Please sign in to comment.