Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in deSPI #1

Open
sjaenick opened this issue Aug 30, 2017 · 7 comments
Open

segfault in deSPI #1

sjaenick opened this issue Aug 30, 2017 · 7 comments

Comments

@sjaenick
Copy link

deSPI-download -d bacteria,archaea refseq

During indexing, deSPI aborts and dumps a core:

(gdb) bt
#0  0x00007f2b7a01e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f2b7a02002a in __GI_abort () at abort.c:89
#2  0x00007f2b7a65884d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f2b7a6566b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f2b7a656701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f2b7a656919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f2b7a656ebc in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f2b7a656f19 in operator new[](unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x0000000000408ef0 in preprocess (refPath=refPath@entry=0x25e1c98 "deSPI-wgs.fa", kmerPath=kmerPath@entry=0x25e1c38 "database.srt", 
    taxonomyNodesPath=taxonomyNodesPath@entry=0x25e1d30 "taxonomy/nodes.dmp", giTaxidPath=giTaxidPath@entry=0x25e1c58 "", bwt_s="", 
    nkmerTID=std::vector of length 0, capacity 0, hash_index=0x7f2b39cdf010) at preprocess.cpp:1100
#9  0x000000000040de30 in build_index (p_opt=p_opt@entry=0x25e1c20) at main.cpp:443
#10 0x0000000000401f4c in main (argc=6, argv=0x7ffec6164a58) at main.cpp:635
(gdb) up 8
#8  0x0000000000408ef0 in preprocess (refPath=refPath@entry=0x25e1c98 "deSPI-wgs.fa", kmerPath=kmerPath@entry=0x25e1c38 "database.srt", 
    taxonomyNodesPath=taxonomyNodesPath@entry=0x25e1d30 "taxonomy/nodes.dmp", giTaxidPath=giTaxidPath@entry=0x25e1c58 "", bwt_s="", 
    nkmerTID=std::vector of length 0, capacity 0, hash_index=0x7f2b39cdf010) at preprocess.cpp:1100
1100            kmersSpchar* _2kmers = new kmersSpchar[endNodeNum*(_kmer - 1)];
(gdb) l
1095            heads.clear();
1096            tails.clear();
1097            //getchar();
1098            fprintf(stderr, "end node is %lu\n",endNodeNum);
1099
1100            kmersSpchar* _2kmers = new kmersSpchar[endNodeNum*(_kmer - 1)];
1101
1102            uint8_t* _2kmers_0p = new uint8_t[endNodeNum];
1103
1104            fprintf(stderr,"outputing... \n");
(gdb) p endNodeNum
$1 = 26732088398
(gdb) p _kmer-1
$2 = 30
(gdb) q

@dfguan
Copy link
Owner

dfguan commented Sep 1, 2017

Hi, Sebastian. Thanks for your comments. It seems deSPI corrupted due to memory allocation problem, you could try it on a machine with larger memory.

@sjaenick
Copy link
Author

sjaenick commented Sep 4, 2017

Even when assuming a lower bound of 9 bytes for the struct (one uint64_t and one uint8_t, i.e. not considering memory alignment/padding), this would be a memory allocation request for ~6.6TB...

@dfguan
Copy link
Owner

dfguan commented Sep 5, 2017

Hi, Sebastian, Could you please tell me the size your reference library and count the number of 31-mers on both strands of your reference library with Jellyfish? For your reference, you may use the command "jellyfish count -m 31 -t 8 -C -s 3000000000". If 6.6TB is required, your reference should at least contain 25G 31-mers.

@sjaenick
Copy link
Author

$ jellyfish stats mer_counts_merged.jf
Unique: 9730723781
Distinct: 12589626593
Total: 31927886526
Max_count: 79722

@dfguan
Copy link
Owner

dfguan commented Oct 27, 2017

Hi Sebastin, thanks for your comments. It seems there are 12.5G branch nodes (12.5G * 30 * 16 bytes (structure considering memory padding) = 6000G) in the deBruijin graph of your reference library, which is unbelievable. There may be something logical error in deSPI. Anyway, I have updated deSPI for decreasing its memory consumption for indexing, you could download it and run "make" to install deSPI. maybe the new one could generate the index you want. If any problems please feel free to contact me.

@sjaenick
Copy link
Author

No difference, crash at same location.

(gdb) p endNodeNum
$1 = 28142765732
(gdb) p _kmer
$2 = 31 '\037'

@dfguan
Copy link
Owner

dfguan commented Dec 5, 2017

Hi Sebastin, the endNodeNum is not used in new deSPI code, could you please check if you are using the latest deSPI? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants