Segmentation fault at stage of joining output blocks #786

charlesfoster · 2024-02-08T02:56:45Z

Hi,

Thanks for the great tool.

I've used diamond successfully to create a database using a version of the clustered NR database (https://osf.io/tejwd). I've outlined the steps I used to create the database with taxonomy information included here: Arcadia-Science/2023-nr-clustering#10.

The problem I'm now facing is that I get a segmentation fault from diamond when searching against the database with some query files. Initially with an assembly from spades (16398 contigs) there were no issues. Now using a better assembly with spades run in metagenomics mode with the --meta flag (47710 contigs, but a better proportion of longer, higher coverage contigs) I get the following:

[truncated]

Processing query block 1, reference block 45/45, shape 1/2.
Building reference seed array...  [0.246s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Building query seed array...  [0.031s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Indexed query seeds = 8651208/16958170 (51.01%), reference seeds = 10577115/191034950 (5.54%)
Soft masked letters = 870/16958170 (0.01%), 0/191034950 (0.00%)
Computing hash join...  [0.017s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Searching alignments...  [0.034s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating memory...  [0s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Processing query block 1, reference block 45/45, shape 2/2.
Building reference seed array...  [0.23s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Building query seed array...  [0.025s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Indexed query seeds = 7851635/16958170 (46.30%), reference seeds = 9887090/191034950 (5.18%)
Soft masked letters = 870/16958170 (0.01%), 0/191034950 (0.00%)
Computing hash join...  [0.015s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Searching alignments...  [0.031s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating memory...  [0s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating buffers...  [0.01s]
Clearing query masking...  [0.002s]
Current RSS: 2.8 GB, Peak RSS: 6.1 GB
Opening temporary output file...  [0s]
Computing alignments... Async_buffer.load() 564000(0.00787899 GB, 0.00413649 GB on disk)
Loading trace points...  [0.008s]
Sorting trace points...  [0.003s]
Computing alignments...  [0.51s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [0.526s]
Deallocating reference...  [0.016s]
Loading reference sequences... Current RSS: 2.6 GB, Peak RSS: 6.1 GB
 [0.013s]
Deallocating buffers...  [0.001s]
Current RSS: 2.5 GB, Peak RSS: 6.1 GB
Joining output blocks... Loading dictionary...  [0.257s]
Joining output blocks... Segmentation fault (core dumped)

The command I used was:

./diamond blastx -d /data/clustered_nr/clustered_nr.dmnd -q meta_scaffolds.fasta -o meta_scaffolds.tax102.tsv --outfmt 102 --include-lineage --log --verbose

Database info:

$ du -sh /data/clustered_nr/clustered_nr.dmnd
111G	/data/clustered_nr/clustered_nr.dmnd

$ ./diamond dbinfo --db /data/clustered_nr/clustered_nr.dmnd
diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

          Database type  Diamond database
Database format version  3
          Diamond build  163
              Sequences  239387197
                Letters  88191052876

System info:

OS: Pop!_OS 22.04 LTS x86_64 
Host: Precision 5820 Tower X-Series 
Kernel: 6.0.2-76060002-generic 
Shell: bash 5.1.16 
CPU: Intel i9-10900X (20) @ 4.500GHz 
GPU: NVIDIA Quadro RTX 4000 
Memory: 10502MiB / 64021MiB

diamond version: v2.1.9.163, precompiled binary downloaded from the Github release.

Do you have any suggestions to circumvent this issue? If you need a copy of the query file I will provide it pending permission from the higher ups.

Thanks!

edit: it seems to be related to the --outfmt 102 in this case. I re-ran the analysis omitting that flag and it ran to completion with no errors.

The text was updated successfully, but these errors were encountered:

bbuchfink · 2024-02-08T09:33:06Z

Looks like the --include-lineage parameter is causing this, will fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault at stage of joining output blocks #786

Segmentation fault at stage of joining output blocks #786

charlesfoster commented Feb 8, 2024 •

edited

bbuchfink commented Feb 8, 2024

Segmentation fault at stage of joining output blocks #786

Segmentation fault at stage of joining output blocks #786

Comments

charlesfoster commented Feb 8, 2024 • edited

bbuchfink commented Feb 8, 2024

charlesfoster commented Feb 8, 2024 •

edited