Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault building ctg_graph using NEXTDENOVO/2.4.0 #133

Open
gitcruz opened this issue Dec 3, 2021 · 1 comment
Open

segmentation fault building ctg_graph using NEXTDENOVO/2.4.0 #133

gitcruz opened this issue Dec 3, 2021 · 1 comment

Comments

@gitcruz
Copy link

gitcruz commented Dec 3, 2021

Describe the bug
I am running an assembly of 1.7G heterozygous genome (1.2% het rate) using a 2TB machine. The ONT data is 50x of the highest quality (used Filtlong ≥5Kb and 150Gb)

1st config file (24cpus 1TB total RAM):
[General]
job_type = local
task = all
rewrite = yes
parallel_jobs = 4
deltmp = yes
read_type = ont
input_type = raw
workdir = /WORKDIR/
input_fofn = /WORKDIR/long_reads.fofn
[correct_option]
read_cutoff = 1k
genome_size = 1.8g
seed_depth = 45
seed_cutoff = 0
blocksize = 1g
pa_correction = 4
minimap2_options_raw = -t 6 -x ava-ont
sort_options = -m 40g -t 20
correction_options = -p 6

[assemble_option]
minimap2_options_cns = -t 6 -x ava-ont -k17 -w17
minimap2_options_map = -t 6 -x ava-ont
nextgraph_options = -a 1

2nd config file (48cpus 2TB total RAM):
[General]
job_type = local
task = all
rewrite = yes
parallel_jobs = 8
deltmp = yes
read_type = ont
input_type = raw
workdir = /WORKDIR/
input_fofn = /WORKDIR/long_reads.fofn

[correct_option]
read_cutoff = 1k
genome_size = 1.8g
seed_depth = 45
seed_cutoff = 0
blocksize = 1g
pa_correction = 4
minimap2_options_raw = -t 6 -x ava-ont
sort_options = -m 40g -t 20
correction_options = -p 6

[assemble_option]
minimap2_options_cns = -t 6 -x ava-ont -k17 -w17
minimap2_options_map = -t 6 -x ava-ont
nextgraph_options = -a 1

Error message
After 10 days the assembly failed I/O error at the 02.cns_align step (see fosrt config). I removed this folder and resubmitted the assembly with more memory (2nd config). It went smoothly but now constantly failing at the ctg_graph step. the error is this:
hostname

  • hostname
    cd /WORKDIR/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
  • cd /WORKDIR/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
    time /apps/NEXTDENOVO/2.4.0/bin/nextgraph -a 1 -f /WORKDIR/03.ctg_graph/01.ctg_graph.input.seqs /WORKDIR/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
  • /apps/NEXTDENOVO/2.4.0/bin/nextgraph -a 1 -f /WORKDIR/03.ctg_graph/01.ctg_graph.input.seqs /WORKDIR/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
    [INFO] 2021-12-03 19:11:48 Initialize graph and reading...
    /WORKDIR/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh: line 5: 19296 Segmentation fault /apps/NEXTDENOVO/2.4.0/bin/nextgraph -a 1 -f /WORKDIR/03.ctg_gr
    aph/01.ctg_graph.input.seqs /WORKDIR/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta

Genome characteristics
C-value =1.7Gb
Paste here the genomescope results:
GenomeScope version 2.0
input file = jf_21mer.hist
output directory = out/21mer/
p = 2
k = 21

property min max
Homozygous (aa) 98.7068% 98.7307%
Heterozygous (ab) 1.26928% 1.29316%
Genome Haploid Length 1,208,134,973 bp 1,210,345,670 bp
Genome Repeat Length 399,334,371 bp 400,065,090 bp
Genome Unique Length 808,800,602 bp 810,280,580 bp
Model Fit 73.122% 95.132%
Read Error Rate 0.214032% 0.214032%

Input data
[Read length stat]
Types Count (#) Length (bp)
N10 266461 29793
N20 648378 23529
N30 1113845 19774
N40 1660889 16968
N50 2295837 14643
N60 3032994 12575
N70 3896295 10664
N80 4925021 8844
N90 6190301 7021

Types Count (#) Bases (bp) Depth (X)
Raw 7860332 100000021650 55.56
Filtered 0 0 0.00
Clean 7860332 100000021650 55.56

Config file
Last config used was:
[General]
job_type = local
task = all
rewrite = yes
parallel_jobs = 8
deltmp = yes
read_type = ont
input_type = raw
workdir = /WORKDIR/
input_fofn = /WORKDIR/long_reads.fofn

[correct_option]
read_cutoff = 1k
genome_size = 1.8g
seed_depth = 45
seed_cutoff = 0
blocksize = 1g
pa_correction = 4
minimap2_options_raw = -t 6 -x ava-ont
sort_options = -m 40g -t 40
correction_options = -p 6

[assemble_option]
minimap2_options_cns = -t 6 -x ava-ont -k17 -w17
minimap2_options_map = -t 6 -x ava-ont
nextgraph_options = -a 1

Operating system

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-
4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.7 (Santiago)
Release: 6.7
Codename: Santiago

GCC
gcc version 6.3.0 (GCC)

Python
Python 3.8.6

NextDenovo
nextDenovo v2.4.0

To Reproduce (Optional)
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

Additional context (Optional)

I made three attempts and error is always: line 5: 19296 Segmentation fault /apps/NEXTDENOVO/2.4.0/bin/nextgraph
any idea on what the problem could be?
I'll be happy to check some intermediate files.

The files in 01.ctg_graph.input.ovls are not empty their sizes range 43M to 195M in the folder 02.cns_alig/*.cns.filt.dovt.ovl

Input_seqs also are there:

for i in $(cat 03.ctg_graph/01.ctg_graph.input.seqs); do ls -sh $i; done
4.3G 02.cns_align/01.seed_cns.sh.work/seed_cns0/cns.fasta
4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta
4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns2/cns.fasta
2.7G 02.cns_align/01.seed_cns.sh.work/seed_cns3/cns.fasta
4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns4/cns.fasta

Any ideas or suggestions on how to fix this problem are welcome!

Thanks

@moold
Copy link
Member

moold commented Dec 6, 2021

Hi, see #113 to fix this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants