Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ragtag wrapper #1397

Merged
merged 31 commits into from Jun 7, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0d671d0
feat: add ragtag wrapper
currocam May 26, 2023
7668479
feat: add threads option
currocam May 26, 2023
77b8354
fixed: remove threads flag from merge
currocam May 27, 2023
4f1da3a
Update bio/ragtag/correction/wrapper.py
currocam May 27, 2023
818e29d
Update bio/ragtag/correction/wrapper.py
currocam May 27, 2023
e6716c1
Update bio/ragtag/correction/test/Snakefile
currocam May 27, 2023
33e1c58
Update bio/ragtag/correction/meta.yaml
currocam May 27, 2023
2d054bb
fixed: add url field to meta yaml
currocam May 27, 2023
16625cb
linter (CI)
currocam May 27, 2023
dc27297
fixed: move files from tmp to output
currocam May 27, 2023
fec442a
fixed: add mandatory tags
currocam May 29, 2023
7d7e774
fixed: incorrect output warning message
currocam May 29, 2023
005f468
feat: improving handling in in/out files for merge
currocam May 29, 2023
b78a852
Update bio/ragtag/patch/meta.yaml
currocam May 31, 2023
fdfe25d
Update bio/ragtag/patch/test/Snakefile
currocam May 31, 2023
51911b3
Update bio/ragtag/patch/wrapper.py
currocam May 31, 2023
28a5a5c
Update bio/ragtag/patch/wrapper.py
currocam May 31, 2023
62ccbc7
Update bio/ragtag/scaffold/wrapper.py
currocam May 31, 2023
1d7ffa8
Update bio/ragtag/scaffold/wrapper.py
currocam May 31, 2023
3be6d6d
Update bio/ragtag/scaffold/test/Snakefile
currocam May 31, 2023
90c66c8
Update bio/ragtag/scaffold/meta.yaml
currocam May 31, 2023
dd9579b
named agps
currocam May 31, 2023
9d35d3b
better syntax with expand for arbitrary agp files
currocam May 31, 2023
d055f32
linter
currocam May 31, 2023
8003c63
Cosmetic fix
fgvieira May 31, 2023
bed4e45
Cosmetic fix
fgvieira May 31, 2023
49fe9e2
Cosmetic fix
fgvieira May 31, 2023
a0e0681
Cosmetic fix
fgvieira May 31, 2023
95c0470
handle extra parameter not found
currocam May 31, 2023
6a6a2e1
fixed moving only assembly alignment files patch
currocam Jun 1, 2023
bcd85bf
handle params.extra not defined
currocam Jun 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions bio/ragtag/correction/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- ragtag =2.1.0
14 changes: 14 additions & 0 deletions bio/ragtag/correction/meta.yaml
@@ -0,0 +1,14 @@
name: ragtag-correction
description: Homology-based misassembly correction.
url: https://github.com/malonge/RagTag/wiki/correct
authors:
- Curro Campuzano Jiménez
input:
- ref: reference fasta file (uncompressed or bgzipped)
- query: query fasta file (uncompressed or bgzipped)
output:
- fasta: The corrected query assembly in FASTA format.
- agp: The AGP file defining the exact coordinates of query sequence breaks.
params:
- extra: additional parameters
notes: Multiple threads can be used during Minimap/Unimap alignment.
14 changes: 14 additions & 0 deletions bio/ragtag/correction/test/Snakefile
@@ -0,0 +1,14 @@
rule correction:
input:
query="fasta/{query}.fasta",
ref="fasta/{reference}.fasta",
output:
fasta="{query}_corrected_{reference}/ragtag.correct.fasta",
agp="{query}_corrected_{reference}/ragtag.correct.agp",
params:
extra="",
threads: 16
log:
"logs/ragtag/{query}_{reference}.log",
wrapper:
"master/bio/ragtag/correction"
4 changes: 4 additions & 0 deletions bio/ragtag/correction/test/fasta/query.fasta
@@ -0,0 +1,4 @@
>sequence A
ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat
>sequence B
ggtaagtgctctagtacaaacacccccaaaaaaaatattgtgatataattaaaattatattcatat
7 changes: 7 additions & 0 deletions bio/ragtag/correction/test/fasta/reference.fasta
@@ -0,0 +1,7 @@
>sequence A
ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat
tctgttgccagaaaaaacacttttaggctatattagagccatcttctttgaagcgttgtc
>sequence B
ggtaagtgctctagtacaaacacccccaatattgtgatataattaaaattatattcatat
tctgttgccagattttacacttttaggctatattagagccatcttctttgaagcgttgtc
tatgcatcgatcgacgactg
39 changes: 39 additions & 0 deletions bio/ragtag/correction/wrapper.py
@@ -0,0 +1,39 @@
"""Snakemake wrapper for ragtag-correction."""

__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import tempfile

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n

assert snakemake.output.keys(), "Output must contain at least one named file."

valid_keys = ["agp", "fasta"]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)

with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py correct"
" {snakemake.input.ref}"
" {snakemake.input.query}"
" {snakemake.params.extra}"
" -o {tmpdir} -t {snakemake.threads}"
" {log}"
)

for key in valid_keys:
outfile = snakemake.output.get(key)
if outfile:
shell("mv {tmpdir}/ragtag.correct.{key} {outfile}")
6 changes: 6 additions & 0 deletions bio/ragtag/merge/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- ragtag =2.1.0
17 changes: 17 additions & 0 deletions bio/ragtag/merge/meta.yaml
@@ -0,0 +1,17 @@
name: ragtag-merge
description: |
Scaffold merging.
url: https://github.com/malonge/RagTag/wiki/merge
authors:
- Curro Campuzano Jiménez
input:
- ref: assembly fasta file (uncompressed or bgzipped).
- agps: scaffolding AGP files.
- bam: Optional. Hi-C alignments in BAM format.
output:
- fasta: The merged scaffolds in FASTA format.
- agp: The merged scaffold results in AGP format.
- links: Optional. If Hi-C alignments in BAM format were given.
params:
- extra: additional parameters. Do not use with '-b', add the bam file to the input instead.
notes: |
15 changes: 15 additions & 0 deletions bio/ragtag/merge/test/Snakefile
@@ -0,0 +1,15 @@
rule merge:
input:
fasta="input/{assembly}.fasta",
agps=expand("input/{scaffold}.agp", scaffold=["scf1", "scf2"]),
#bam = "input/Hi-C.bam",
output:
fasta="{assembly}_merged.fasta",
agp="{assembly}_merged.agp",
#links = "{assembly}_merged.links",
params:
extra="",
log:
"logs/ragtag/{assembly}_merged.log",
wrapper:
"master/bio/ragtag/merge"
155 changes: 155 additions & 0 deletions bio/ragtag/merge/test/input/asm.fasta
@@ -0,0 +1,155 @@
>NZ_AAJV02000003.1 Escherichia coli E22 gcontig_1112495653270, whole genome shotgun sequence
TTGGTTGTAACACGGCGTATGGCACATGCGTCGTTAGCGGTCTGGTGACGTTAAAGGGGACAATCCACTC
CTTGCTCGAGCAAACAAACCAGGTAGCCGGAATGTGCAAGTCAATGATGACGCTGATAAGACGCCTAACC
AGCGTGGCGATTCGGTTTGACGCCTGGGAAGAGACCAGGGTGCAACGATGAGGGCATTTATGGAACCGCG
ACAAAGTGTGGTGCCGTAACTGGCTAAGTGCTCTCAGCGTTGTGGTGAATGCGCAGGCTGATGCGCGAAA
GACATTGCAGCTATTGCGGAAAAGAGCTGTTCGGCGGGGCAATTAAATGCCCGTGAGAGTCTGAAATAAC
CGCAAGCCGGAGATCAGCACCGGTCACCACAACAGCCACTGCTTTGGCGGTACCAGTTTGTACACTTGCT
TCCGGCTGGTACCGCTCTTTTTACAAAACAGAGAAGAGCATCACCGGACGACGGGCTCATAACCCAATCC
ATCCGGGCGGCTGTCACCGCAGGTGTTCTTCTCTGTTTTGTGGAGAAACCAACCGACCTTGCAGGGTCGA
TATGATGAGGAGCAGCAAAATGGCTAGCGAACGCAGTACTGATGTGCAGGCATTTATCGGGGAGCTGGAC
GGCGGCGTATTTGAAACCAAAATCGGCGCTGTTCTCAGTGAAGTCGCTTCCGGTGTGATGAACACGAAAA
CCAAAGGTAAGGTCTCGCTCAACCTGGAAATCGAACCGTTTGATGAGAACCGTGTGAAAATCAAACACAA
ACTCTCATATGTTCGCCCGACTAACCGCGGGAAAATTTCCGAAGAAGACACCACCGAAACGCCGATGTAT
GTCAATCGCGGTGGTCGCCTGACTATTCTGCAGGAAGACCAGGGACAATTACTGACTCTTGCCGGTGAAC
CTGACGGAAAACTCCGCGCAGCAGGTCATTAATATCGTTTTTAATTAACTGATTATTTATCTCATTACTG
AATATTTTATATAGTGAGGACTTATTATGTCTCAGAACTTAGACGCAACCGCAATTAATCAAATCCATAC
CCTTATTTCTGCTCAGGGTGTTAATGAAATTATCAGTAAGATTGGTGCCGATGCTGTGGCATTGCCTGAG
AATTTCCGCATTCATGATCTGGAAAATTTAATTTAAATCGTTTCCGTTTCCGTGGTGCGCTTTCCACTGC
CAGCATCGATGACTTTACCCGTTATTCTAAAGATCTTGCAGATGAGCACCGCTGCTTTATCGATGCCGAT
AATATGCGTGCCGTCAGTGTGCTTAACCTAGGTACTATTGATGAACCAGGTCACGCAGATAACACCGCCA
CTCTCAAACTGAAAAAGACAGCACCGTTCTCTGCTCTGTTGTCTGTTAATGGCGAGCGTAACTCCCAGAA
GTCACTGGCAGAATGGATTGAAGACTGGGCCGACTACCTTGTGGGCTTTGATGCTAATGGTGACGCCATT
CAGGCAACAAAAGCAGCTGCGGCAGTCCGTAAAATCACAATTGAAGCAAACCAGACCGCTGATTTTGAAG
ACAATGACTTCAGCGGCAAACGCTCTCTGATGGAGTCTGTCGAAGCGAAGACCAAAGACATTATGCCAGT
GGCATTTGAATTTAAATGCGTTCCGTTTGAAGGTCTGAAAGAACGTCCGTTTAAATTACGCCTCAGCATT
ATCACTGGCGATCGTCCTGTACTGGTTCTGCGCATTATTCAGCTGGAAGCGGTGCAGGAAGAAATGGCTA
ACGAATTTCGTGATCTGCTTGTTGAGAAATTCAAAGACAGCAAAGTAGAAACCTTTATTGGTACTTTCAC
CGCCTGATTTCATTACTGCAAATGCCCCTGCGGGGGCATTTATGGAAACGTAATTAACTCAATAATCACC
GGATGGTGAGGTCTTCCTTTTACCAGAATTCAGCGTGGTGCAGCACATATACGTGGAGAACAAAATGTCA
TTTATTAAAACTTTTTCCGGGAAGCATTTTTATTATGACAGGATAAATAAAGACGACATCGTGATTAGCG
ATATCGCGGTTTCCCTTTCAAATATCTGTCGCTTTGCAGGACATCTTTCACACTTCTACAGCGTCGCCCA
GCATGCGGTGCTTTGCGGCCAGCTGGTGCCGCAGGAATTTGCTTTTGAAGCGTTAATGCATGATGCAACA
GAAGCGTATTGCCAGGACATCCCAGCTCCACTGAAACGCCTTCTTCCTGACTATAAACGGATGGAAGAAA
AAATAGACGCCGTAATCCGTGAGAAATACGGGTTACCTCCTGTTATGAGCACGCCAGTGAAATATGCCGA
TCTCATTATGCTGGCAACCGAACGCCGCGATCTCGGGCTTGATGATGGCTCTTTCTGGCCTGTACTGGAA
GGTATCCCGGCAACAGAGATGTTCAAAGTTATTCCACTGTCGCCAGGCCATGCCTATGGGATGTTTATGG
AACGTTTTAACGAGTTATCGGAGTTACGCAAATGCGCATGAATGTTTTCGAAATGGAAGGGTTTCTTCGC
GGGAAATGTGTACCCCGAGATCTGAAAGTGAATGAAACAAATGCTGAGTACCTGGTACGTAAATTCGATG
CGCTTGAAGCTAAATGTGCGGCACTGGAAAACAAAATAATACCAGTGTCAGCTGAACTGCCGCCAGCAAA
TGAAAGTGTTCTGTTATTTGATGCTAACGGAGAAGGCTGGCTAATTGGCTGGCGTTCTCTCTGGTACACC
TGGGGACAAAAAGAAACCGGAGAATGGCAGTGGACATTTCAGGTCGGGGACCTTGAAAACGTCAATATCA
CTCACTGGGCAGTAATGCCGAAAGCACCGAAGAATAAAAAATGAGCGTGATAAAAACTCATACAGGAATT
GTTATCACCCGAGACGGTCCGCAGGTAAAAAAACTGCACCAGACAAAGCGGATGTGGGTCGTCGGAAAAA
ACGAGTTTTACCACAAAGAAACCGGACGCCGCCACTTTGCAGAAAATACTCGCCGCCGACTGCTGATCGA
TACCATCAAGCCTATCGAGGTGAAGCATGTTTAAACAGAACGAAAAATCTATCGCTCAAATTGCTGAGTA
TATCCCGCGTGCGTGCCGGGGTATGCAGTTGCAGGAAGCCAAAGCACGCCTGGAGAAAAAAATTGCGCTC
TATATCGATGACGGCTGTGATGCCGCCGTTCTTAACGCGGCGTTCGCGCCAGCTCTTAACAGTCATACGC
GAAAGTCTTTTTTTTCGTGCATCGCAGCGCAGATCCGTAAAGGAGGCAACCAGTGAGCAACATTAACTAT
CAGGTACTGCGTGAAAAGGCAGAGAAAGCAACTAAAGGAAGCTACATCGTAGGGCATACATCTGTTAACC
AACACGGCAATTTAACAGGAGTTTTTGTTTGTCAAAAATGGAAAGGAGAACCCGGTGGCGTGATTGCGGA
ATGTCATGTTAACTGCCTGATTGAATCAGATGCTCAGGCTTATGCAAACGCTGAATTCATAGCAGAGGCT
AACCCGGCTACCGTGCTGGCACTGCTGGATGAACAGGAAAGAAACCAGCAATACATCAAACGCCGCGACC
AGGAGAACGAGGATATTGCGCTAACGGTAGGGAAGCTGCGCGTTGAGCTTGAGGAGACAAAATCAAAACT
CAACGAGCAGCGCGAGTATTACGAGGGAGTTATCTCTGATGGGTGCAAGCGTATTGCTGAACTGGAAGCG
CGGGAAGTTCAATTACCGACTCGCTACGACCTTCGATATGGACACCCGATAAATGCAGATGAGCGACAAG
TCATGATACCTAAAGAAAATGGCAGTTGGCTTTACCTGATTGACCTAGAACACGCATTACGCGTCGCTGA
CATTCGCATCAAAGGAGAGTGATATGGCGTTAACACACCACGAACTCTGTCAGATTGCGTACAAGTTCCT
TAAGCGCAACGGGTTCAAGGTTTGCTTTCATGACCGCTTTGTTGCTGTAACCAGTACCGGAGAACAGCCA
GATGCTATGGGATTCAGAAATTCAGCATCATGCCTGATAGAGGCGAAGTGTTCTCGTGCTGACTTGTTGG
CAGATAGAAAAAAGCGTTTCCGTAAAAATCCCTCACTTGGCATGGGCGACTGGCGATTCTTTATTAGTGA
GCCGGGAATTATTTCAATTGAGGATTTACCACCTGGCTGGGGATTACTTCACGTTGTTAACGGAAGAGTA
CGGAAAGTACATGGGTGGCCCAGGGGTAATTGCTGTTGGGGTAATCCTGACGATAAGCCATTTACTGGAA
ATAAGCAGGTTGAATGCGATTACATGTTATCTGCATTAAGGCGCATGGAGTTGAGAGGGCACCTTAATGA
AATATATGACGGTGTAATTGTTAATAAGAAAGAAGGAAACGCGGCATGATCACTATTACCAAAGAGCGAC
TGCTGACAATCAAGCAGTGGCGCGAAACATACGGACCTGATAGCAACGTTGTACTGCCAGCAGAAGAAGC
GGAAGAACTGGCGCGAATTGCACTGGTATCGCTGGAAGCAGAGCTGGTGGCAAAGATTATAGCTCATTAC
CCATTAGGAGTTGACGTAGGCAAACAAAAGTTCGTACAGGCCATTGGAGAGCTTCCTGACTTTGGCGGAT
ATCTATTTGCCGCCCCTCCAGCGCCGATAGTGCCGGAAGAAATGTATTGGCAGGATGCGCCAGTTGAAGG
CAGCAGCAAAGCGGCTGCATACGCTACAGGCTGGAACGATTGCCGCGAAGCCATGCTTCAGTCCGGAAAC
TTTCGGGAAAATAAAGATTCGTCAACCAATAATTTTCGGAAAATCCCGGAAGCGTCAACCAGCTCTCCGG
TAACTCCGGCTCTTCTGCCTGGTGGTTTCACCATTGAGGATGCGAAGGAATTACATGAAGACCTGGCACG
CAGCCACATAAGCAAGGCCTTAAGTGGCGAAAAGATGAAAAAGAAAGATCGCGATGCTGATTTGCGCTGG
ATTCATGGCGTTATAGTTCAGGCAGCGTGGTTTGTAAAAGCATCACTGGAGCAGAATGCACTATCGGGCA
ACTATCCGGTAACTCCGGATAGTTGGATAAGCTGTAGTGAGCGAATGCCGGATAAGTTAATTCCGGTAAT
GGTCATGTATGAAGACGGTGAGATGTGGTCTGCAATGTGGAATGGCAATCGCTGGGATGATGGCACCGAA
TATCCGGATCCGCACTCAGTTACGCACTGGCGTGAAATGCCAGCAGCACCGCAGCAGGAGGTGAATCAAT
GAGCTGGCCTGATGCAATCGTAACTCTGGGGGTGGTATTCGCAGCAGCGTTTGTTGTGTTCTCGATTTGT
CGATGGGGATAACCACATGTTCGCTTTGATTCAACGCGGGCAGATATACACCGATAGTGCTGGCTACCCG
ATAAAAATTGTTCGCTGCATAAACAACACAGTGTTGTACAGAAGAATGGATGGGCGAACACAGTCAGTAA
AAATAAACGATTTCAATGAACGTTTGAACGGATTGATCACCAGGAATACCGACAAATTCTGGCAGAAACA
GAGCAGGAAGCTCATCTGAAAAAATTACGGGCCATGAAAAGGAAGTAAACAATGAATAAAGCATTTGAAC
GATGGATACGCCAGCGTTACGGCAATCGCTATGATCTGACGCGAGATGTTGACGGTTTCTACTGTCGTGA
AGTTGTGAAGCGAATGTTTGAAATGTGGTGCCACTGCCGTGGATGAAAGTTTTATGAGGTTGGCATGCAG
ACAATCATCTATCAGATAACCCCCAGCAAATGGTGTACGGAGAGAGTCCTTATTGCATCAACAGGGCTAA
AGCCCGGCACCATCGAGCGGGCCAGAAGAAAGTCATGGATACAGGGAAAAGAATACCGCCATTACTCTGT
AGAAGGTGATCCGGGGCACTACAGTGAATGCCTGTACAACATCGAAGAGATTATGCGATGGATCGAAAAC
CAGAAACAACCAGGTGCCAAAAATGCAAGTTCCGGTTAACCTGTTAATGCTCCTGGACGTCTGGGAGGTT
TAATGAGTAACGCATCATACCCGACAGGCGTTGAAAACCATGGCGGATCACTCCGTATATGGTTTCACTA
TAATGGCAAACGTGTCAGAGAAAACCTCGGTGTTCCTGACACCGCCAAAAACCGGAAGATCGCAGGTGAA
CTTCGCACTTCCGTTTGTTTTGCAATCAGAATGGGGAGTTTCGACTACGCCGCGCAGTTCCCTAATTCCC
CTAACCTGAAACACTTTGGTCTGGGAAAAAGAGAGATAACCGTTAAGGCACTTTCGGAAAAATGGCTTGA
ACTGAAGAAAATAGAGATTTGTGCGAATGCACTTAATCGTTACCAGTCAGTAATTAAAAACATGTTGCCT
ATGTTGGGAGAAAAAAAACTAGTTTCATCCATAACAAAAGAGGATTTACTTTTCGCAAGGAGAGATTTGT
TGACCGGTTACCAAAAGCTTTCTAATGGAAAGACTTCTTCCATAAAAGGGCGCTCAGTGGTCACGGTAAA
CTACTATATGACAACCATAGCTGGAATGTTTCAATTTGCAACAGATAATGGTTATACCTCAGGAAACCCA
TTTAACGGTCTGGCACCCTTAAAAAAGTCCAAGGTAAAACCAGATCCTCTCACCCGTGACGAATTTATTC
GTTTTATTGAGGCTTGCCGTCATCAACAAACAAAAAACCTGTGGATTCTCGCTGTATACACGGGTATTCG
TCACGGGGAGCTGGTATCGCTGGCATGGGAAGATATAGACCTTAAAGCAAGGACTATAACCATCCGTAGG
AATTATACAAAACTTGGCGAATTCACTCCACCAAAAACCGATGCAGGCACCGGAAGGACAATTCATCTGG
TTCAACCAGCTATTGATGCTCTTAAAAGCCAGGCGGAAATGACCATGCTTGGAAAGCAACATTCTGTAGA
GGTGAAGCAGAGGGAATATGGGAGAACTGCTGTGCATAAATGCACTTTTGTTTTTAGTCCTCAGGTAACA
AAACAGCAGCAGTTGTCCGGACCTCACTACAAGGTTGACTCCATCAGGGAGTCATGGACAAGTATCTTAA
AACGCGCAGGTCTGAGACACAGAAAATCGTACCAATCCAGGCATACTTATGCATGCTGGTCACTTGCCGC
AGGAGCTAATCCTAGTTTTATCGCAAGCCAGATGGGCCACACAAACGCACAAATGGTATTCAATGTTTAC
GGAGCATGGATGAAAGACAACAATCACGAACAGATAGAACTCCTTAACAAAAGACTATCTGAAAGTGTCC
CATGTATGCCCCATAAGAAAGCTGGGTAAAATAAAAACTTGCAAAATCAATTAGTTTACCCTTAATCCCT
GTCACGTTACGCGCGTGGCAGAGGCGTTACGGGTTGCTGAAACCGCAACGGACAGACGGCGGTCATCGAC
TGTTCAACGATGCCGATATTGACCGTATCCGCGAGATCAAACGCTGGATCGACAACGGCGTGCAGGTCAG
CAAAGTTAAAATGCTGCTCAGTAATGAAAATGTTGATGTGCAGAACGGCTGGCGCGATCAGCAAGAAACA
TTACTGACCTACCTGCAAAGCGGCAATCTGCATAGCCTGCGAACGTGGATCAAAGAGCGCGGTCAGGATT
ACCCCGCCCAGACACTCACCACACATCTGTTTATTCCTCTGCGCCGACGACTTCAATGCCAACAACCGAC
TCTCCAGGCGCTGCTGGCGATCCTCGACGGCGTACTGATCAACTACATCGCCATTTGTCTGGCTTCGGCA
CGTAAAAAACAGGGTAAAGATGCGCTGGTGGTTGGCTGGAATATTCAGGATACCACCCGTCTGTGGCTGG
AGGGCTGGATTGCCAGTCAACAAGGATGGCGCATTGATGTCCTCGCCCACTCGCTCAATCAACTACGCCC
TGAACTATTCGAAGGCCGTACATTGCTGGTGTGGTGCGGTGAAAATCGAACCTCCGCCCAACAGCAGCAA
CTCACCAGTTGGCAAGAACAAGGCCATGATATTTTACCACTCGGCATTTAATGATTCGTTAACAAATGCG
CTTTACTGTACAATCCTTTCGTTAACATAAGGAGTGCATTATGCGCATAGCTAAAATTGGGGTCATCGCC
CTGTTCCTGTTTATGGCGTTCGGCGGAATTGGTGGCGTCATGCTCGCAGGTTATACCTTTATTTTGCGTG
CTGGCTAAGCGCCTGCACCAGCCTTTCAAACAGGCGGTCTGCGATGATCGCCGCCAGTGCCACCAGTAAC
GCCCCCTGGATCACATACGCGGTATTAAATCCGCTAAGCCCGATGATGATGGGCGTACCCAGCGTGCTGG
CCCCTACCGTTGAGGCGATCGTCGCCGTACCAATGTTGATAATCACCGAAGTTCGCACGCCCGCCAGAAT
CACCGGAGCCGCCAGCGGTAGCTCGACCTTACGCAGTCGCTGACCACGACTCATTCCCATACCTTTCGCA
ACTTCTGTCACGCTGGCATCGATCGCTCCCAGCCCGGCAAGTGTCGCCTGCAGGACGGGCAGCACACCGT
AAAGGATCAAGGCGATAATCGCTGGTTGCAGACCAAAGCCGATCACCGGAACGGCAATCGCCAGCACTGC
GACAGGCGGAAAAGTCTGTCCAACGGCGGCAATAGTTTCCACCAGTGGGCGAAATTCCGCGCCCCACGGG
CGAGTGACAGCAATTCCGGCACCAGTGCCAATGATCACCGCAAACAAACTCGAAATTCCCACCAGCCAGA
AATGAGCCAGTGCCAGAGCTGCAAAACTTTCTTGCTGATAAACGGGTCGTGGCAGTTGTGGGAACAAGGC
AGCAAACAGCGGCTGGCTGTAAGGCAGCCAGAAAATCAGTGCCACAAACAAAGCAATGAGCCAGAACAGC
GGATCGCGCAACATCTTCATACGCTTACGCCTCCACCAGCAGATCCTGAAAATGCAGCGTACCGCAAGGC
TGGCCCTGCGTGTTCACCACCGGCAGCACCTCGCAACCCCGCGCGACAAACAGGGAGAGCGCATCGCGTA
GCGTCATCTCTTCTACCAGTGCCTCACCTTCTGCCCGTTCTTCGCGACGCACGTAATCCGCCACACTACG
TAACGAAAGCAGACGCACACCCAGTTCGCTACGTCCAAAAAACTGGCGGACAAAATCATTCGCCGGACGA
GTCAGCATCGTCAGCGGATTGCCCTGCTGCACCACTTCACCGTGATCCATCAATACCAGATGTTCAGCCA
GCCGTAGCGCCTCATCAATATCATGAGTGACCAGCACAATAGTGCGCCCCAGCAAACGGTGAATGCGCGT
CATCTCTTGTTGCAACGCGCCGCGCGTTACCGGGTCCAGTGCGCCAAAAGGTTCATCCATCAGTAAGACT
TGCGGATCGGCAGCCAGTGCGCGCGCCACTCCCACACGTTGCTGCTGACCACCGGAAAGCTGATGCGGAT
AACGCTCACGCAAATTTGACTCCAGCCCCAGTAGCGTCATTAATTCGTCGATACGATCGTCAATCCGTGC
CCGCGACCATTTTTGTAATTGCGGCACGGTAGCGATGTTTTGCGCCACGCTCCAGTGGGGGAACAGGCCA
ATAGATTGAATGGCATAGCCCATCCGGCGGCGCAACTCCAGTACTGGCAGCGAGCGAATTTCTTCTCCGG
CAAAGCGGATCTCTCCGCTGTCATGTTCCACCAGGCGGTTAATCATTTTCAAGGTGGTGGATTTGCCGGA
GCCAGATGTGCCAATCAGCACCGAAAAACTCCCTTCCTGAAAATTGAGATTAAGATCGTTAACGGCTTTT
TGTGCGCCAAACAGTTTGCTGACATGGCTAAATTCAATCATTACGTTTCACCTTCAGCAGTGCGATAAGC
AAATCGAACAGGGCATCGATCAGCACCGCCAGAACAATTACCGGGATCACCCCCAGCAACACTAAATCAA
TGGCGCTGCTTAGCAGCCCCTGGAAAACCAGCGCACCAAAACCGCCTGCGCCGATTAACGCCGCAATCAC
CGCCATACCTACAGTTTGCACCATCACCACCCGCAGGCTGCGCAGAAATACCGGTAACGCCAGTGGTAAC
TGAACATGCAGGAATCGCTGCGCCCCGCTCATGCCCATCGCTCTGGCGCTCTCCAGCACATCGCGCGGGA
TCTGGTTCAAACCGACTACCACGCCGCGCACCAGCGGCAGCAAGGCATAGAGCACCAGCGCAATCAGTGC
GGGTGTCATTCCGGTTCCTGCTATGCCGAGCTTCCCCAGCCACGGAAAGGCCGTCACCAGCGCGGCAAGC
GGCGCAATCAACAGGCCAAAGAGCGCCACCGAAGGCACGGTCTGAATGACATTGAGCAGAGAAAAAATTG
CCCCCTGCCGCGCAGTGGAAAAGTAGCACCAGATGCCCAACGGCACACCAATCACTAACGCAGGCAGCAC
CGCACCAAACAGCAACGTCAGATGTTGTGCCAGCGCGTCGTCAAACACATCCTGACGGTTGGCGTATTCT
TTCATTAGTGAGAGATCGTTAAGCGTGCCGGAGTACAGCAACCACAGCGGAATAATGGCAATCTGCATAT
GCAACAACCAGCGCCACAGCGGATGCGTGGAGATTCGGCGGATGGCATCGCTACAGGCCAGCAATGCCAG
CGCCGCAGCCAGCCAGAAACCACTGCCGAGGCTGGTACGCGCCAGCGCACTGCCATTTTGCGCCAGTTGG
GTCGCCGCCTTTCCAGCTCCCCACACCAACAATACGAAGACGAATTGCGCCAGAATGAGTGCACAAATGC
1 change: 1 addition & 0 deletions bio/ragtag/merge/test/input/asm.fasta.fai
@@ -0,0 +1 @@
NZ_AAJV02000003.1 10780 93 70 71
3 changes: 3 additions & 0 deletions bio/ragtag/merge/test/input/scf1.agp
@@ -0,0 +1,3 @@
## agp-version 2.1
# AGP created by RagTag v2.1.0
seq1_RagTag 1 10780 1 W NZ_AAJV02000003.1 1 10780 +
3 changes: 3 additions & 0 deletions bio/ragtag/merge/test/input/scf2.agp
@@ -0,0 +1,3 @@
## agp-version 2.1
# AGP created by RagTag v2.1.0
seq1_RagTag 1 10780 1 W NZ_AAJV02000003.1 1 10780 +
55 changes: 55 additions & 0 deletions bio/ragtag/merge/wrapper.py
@@ -0,0 +1,55 @@
"""Snakemake wrapper for ragtag-merge."""

__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import tempfile


currocam marked this conversation as resolved.
Show resolved Hide resolved
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

fasta_file = snakemake.input.get("fasta")
# Check fasta_file is no
assert fasta_file, "Input must contain only one fasta file."

agp_files = snakemake.input.get("agps")

assert len(agp_files) >= 2, "Input must contain at least 2 agp files. Given: %r." % len(
agp_files
)

bam_file = snakemake.input.get("bam")

# Add Hi-C BAM file to params if present
if bam_file:
snakemake.params.extra += f" -b {bam_file}"
currocam marked this conversation as resolved.
Show resolved Hide resolved

# Raise warning if links file is expected but no Hi-C BAM file is given
if snakemake.output.get("links") and not bam_file:
raise "Links file is present but no Hi-C BAM file is given."

# Check that all keys in snakemake output are valid are either agp, fasta or links
assert snakemake.output.keys(), "Output must contain at least one named file."
valid_keys = ["agp", "fasta", "links"]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)

with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py merge"
" {fasta_file}"
" {agp_files}"
" {snakemake.params.extra}"
" -o {tmpdir}"
" {log}"
)
for key in valid_keys:
outfile = snakemake.output.get(key)
if outfile:
shell("mv {tmpdir}/ragtag.merge.{key} {outfile}")
6 changes: 6 additions & 0 deletions bio/ragtag/patch/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- ragtag =2.1.0
20 changes: 20 additions & 0 deletions bio/ragtag/patch/meta.yaml
@@ -0,0 +1,20 @@
name: ragtag-path
description: Homology-based assembly patching.
url: https://github.com/malonge/RagTag/wiki/patch
authors:
- Curro Campuzano Jiménez
input:
- ref: reference fasta file (uncompressed or bgzipped)
- query: query fasta file (uncompressed or bgzipped)
output:
- fasta: The final FASTA file containing the patched assembly
- agp: The final AGP file defining how ragtag.patch.fasta is built.
- rename_agp: Optional. An AGP file defining the new names for query sequences
- rename_fasta: Optional. A FASTA file with the original query sequence, but with new names.
- comps_fasta: Optional. The split target assembly and the renamed query assembly combined into one FASTA file.
- ctg_agp: Optional. An AGP file defining how the target assembly was split at gaps
- ctg_fasta: Optional. The target assembly split at gaps
- asm_dir: Optional. A directory containing Assembly alignment files.
params:
- extra: additional parameters
notes: Multiple threads can be used during Minimap/Unimap alignment.