Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) at 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0 #177

Open
agesantos opened this issue May 31, 2023 · 1 comment

Comments

@agesantos
Copy link

Describe the bug

Have been running nextdenovo (v2.4.0) for the last couple of months for a de novo genome assembly (total genome size of 2.5 GB). The program took two months to finish the jobs within 02.cns_align.sh.work but everything ran ok. However, once it moved to 03.ctg_graph it stopped with a segmentation fault at the /01.ctg_graph.sh.work/ctg_graph0.

Error message

#$ tail -f 10 pid7227.log.info

"[INFO] 2023-03-25 08:09:41,174 Submit jobID:[61788] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align23/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-26 22:35:08,091 Submit jobID:[48347] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align24/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-28 17:27:35,526 Submit jobID:[52069] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align25/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-29 15:05:12,344 Submit jobID:[54523] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align26/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-06 12:48:15,650 Submit jobID:[60690] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align27/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-15 11:51:34,786 cns_align done
[INFO] 2023-04-15 11:51:39,917 Total jobs: 1
[INFO] 2023-04-15 11:51:39,929 Submit jobID:[15206] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle.
[ERROR] 2023-04-15 11:51:59,560 ctg_graph failed: please check the following logs:
[ERROR] 2023-04-15 11:51:59,561 /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e"

#$ cat /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e

"hostname

  • hostname
    cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
  • cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
    /NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
  • /NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
    [INFO] 2023-04-15 11:51:40 Initialize graph and reading...
    Segmentation fault (core dumped)"

Genome characteristics
`genome size ~2.5Gb,
heterozygous rate - I have not estimated this since I only have PacBio reads. However, by experience in the organism in question I would say it has a moderate to low heterozigozity
repeat content - based on a closely related species it should be ~50% of the genome assembly

Input data
`Types Count (#) Length (bp)
N10 320113 32018
N20 795312 25252
N30 1371150 21441
N40 2038774 18706
N50 2799212 16482
N60 3664213 14381
N70 4677464 11970
N80 5939270 9241
N90 7694302 6074

Types Count (#) Bases (bp) Depth (X)
Raw 13506201 134176569075 53.67
Filtered 1746043 777160708 0.31
Clean 11760158 133399408367 53.36
`

Config file

“[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = no # yes/no
deltmp = yes
parallel_jobs = 7 # number of tasks used to run in parallel - M/64 here, 64 can optimize to 32~64
input_type = raw # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = nexdenovo_assembly

[correct_option]
read_cutoff = 1k
genome_size = 2.5g # estimated genome size
sort_options = -m 40g -t 5 # -m TOTAL_INPUT_BASES * 1.2/4g -t P/pa_correction
minimap2_options_raw = -t 4 # -t P/parallel_jobs
pa_correction = 5 # M/(TOTAL_INPUT_BASES * 1.2/4)
correction_options = -p 3 # -p P/pa_correction

[assemble_option]
minimap2_options_cns = -t 4 # -t P/parallel_jobs
nextgraph_options = -a 1"

Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core

GCC
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-5.4.0/configure --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 5.4.0 (GCC)

Python
Python 3.8.5

NextDenovo
nextDenovo v2.4.0

To Reproduce (Optional)
In my past attempts I was able to run the software successfully, with both PacBio CLR and PacBio Hi-Fi reads. So I don't think i can recreat the problem with a smaller dataset. Sorry :/

Additional context (Optional)

Note that it did not run out of memory, so I am a little bit puzzled about what the error is about.

In the past, I have successfully run Nextdenovo (same version) for a similar genome without this problem. The only difference I can notice between the two projects is that in the past I provide the reads in fasta format (one file), while now I used fastq (two files). In both cases PacBio CLR.

I have found a similar issue on github (#86) but unfortunately, there was no solutions available.

Would you kindly let me know if you managed to solve this issue?

Also, I have never been able to restart a stopped job. How can I do this? In the FAQ’s is said to
“simply run the same command” but when I tried it, it created a backup of all the previous folders and start the assembly all over again from the beginning.

I hope you can help me,
Cheers,
André

@moold
Copy link
Member

moold commented Jun 1, 2023

see #113

rewrite = no # yes/no means NextDenovo can not overwrite the existed work directory, so it has to create a backup of all the previous folders and start the assembly all over again from the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants