Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telomeres getting lost #182

Open
JWDebler opened this issue Jun 9, 2023 · 6 comments
Open

telomeres getting lost #182

JWDebler opened this issue Jun 9, 2023 · 6 comments

Comments

@JWDebler
Copy link

JWDebler commented Jun 9, 2023

Describe the bug
I was running a comparison of current long read assemblers and realised that NextDenovo has one of the best contiguities, however, it seems to trim a lot of the telomeres.

Genome characteristics
40 mb fungal genome, haploid, about 20% repeats

Input data
Nanopore Q20+, 90x raw read coverage, read N50 ~15 kb

Config file

job_type = local
job_prefix = 15CUR005.nextdenovo
task = all
rewrite = yes
deltmp = yes
parallel_jobs = 4
input_type = raw
read_type = ont # clr, ont, hifi
input_fofn = 15CUR005.fofn
workdir = 15CUR005.nextdenovo

[correct_option]
read_cutoff = 1k
genome_size = 42m # estimated genome size
sort_options = -m 50g -t 30
minimap2_options_raw = -t 4
pa_correction = 5
correction_options = -p 30

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

NextDenovo
2.5.2

Flye vs NextDenovo
This is what the 5' ends of the contigs look like assembled by Flye 2.9.2 vs NextDenovo 2.5.2
image

Are there any parameters that can be tuned to avoid the loss and following manual curation of chromosome ends?
Cheers,
Johannes

@moold
Copy link
Member

moold commented Jun 12, 2023

It's hard to say, in general, the default parameters set is best in most cases, I think you can try to merge the two assemblies first. Otherwise, you need to first confirm whether the corrected reads contain telomere reads, and then adjust the nextgraph parameters.

@JWDebler
Copy link
Author

Thanks.
Where can I find the NextDenovo corrected reads?
I couldn't spot them in the output directory.

@moold
Copy link
Member

moold commented Jun 12, 2023

Should be in 01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta

@JWDebler
Copy link
Author

Thanks, found them. And yes, telomeres are missing from those corrected reads.

@webbchen
Copy link

webbchen commented Feb 5, 2024

I've observed the same issue in two oomycetes, most telomeres went AWOL. @JWDebler : Did you solve it eventually or did you just use another assembler?

@JWDebler
Copy link
Author

JWDebler commented Feb 6, 2024

@webbchen I did not. My workaround is that I assemble with both Flye and NextDenovo. ND seems better at assembling big repeat rich regions which Flye breaks into separate contigs. And Flye tends to be better around chromosome ends. I then use the ND assembly to manually scaffold the Flye contigs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants