Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault at stage of joining output blocks #786

Open
charlesfoster opened this issue Feb 8, 2024 · 1 comment
Open

Segmentation fault at stage of joining output blocks #786

charlesfoster opened this issue Feb 8, 2024 · 1 comment

Comments

@charlesfoster
Copy link

charlesfoster commented Feb 8, 2024

Hi,

Thanks for the great tool.

I've used diamond successfully to create a database using a version of the clustered NR database (https://osf.io/tejwd). I've outlined the steps I used to create the database with taxonomy information included here: Arcadia-Science/2023-nr-clustering#10.

The problem I'm now facing is that I get a segmentation fault from diamond when searching against the database with some query files. Initially with an assembly from spades (16398 contigs) there were no issues. Now using a better assembly with spades run in metagenomics mode with the --meta flag (47710 contigs, but a better proportion of longer, higher coverage contigs) I get the following:

[truncated]

Processing query block 1, reference block 45/45, shape 1/2.
Building reference seed array...  [0.246s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Building query seed array...  [0.031s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Indexed query seeds = 8651208/16958170 (51.01%), reference seeds = 10577115/191034950 (5.54%)
Soft masked letters = 870/16958170 (0.01%), 0/191034950 (0.00%)
Computing hash join...  [0.017s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Searching alignments...  [0.034s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating memory...  [0s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Processing query block 1, reference block 45/45, shape 2/2.
Building reference seed array...  [0.23s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Building query seed array...  [0.025s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Indexed query seeds = 7851635/16958170 (46.30%), reference seeds = 9887090/191034950 (5.18%)
Soft masked letters = 870/16958170 (0.01%), 0/191034950 (0.00%)
Computing hash join...  [0.015s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Searching alignments...  [0.031s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating memory...  [0s]
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Current RSS: 2.9 GB, Peak RSS: 6.1 GB
Deallocating buffers...  [0.01s]
Clearing query masking...  [0.002s]
Current RSS: 2.8 GB, Peak RSS: 6.1 GB
Opening temporary output file...  [0s]
Computing alignments... Async_buffer.load() 564000(0.00787899 GB, 0.00413649 GB on disk)
Loading trace points...  [0.008s]
Sorting trace points...  [0.003s]
Computing alignments...  [0.51s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [0.526s]
Deallocating reference...  [0.016s]
Loading reference sequences... Current RSS: 2.6 GB, Peak RSS: 6.1 GB
 [0.013s]
Deallocating buffers...  [0.001s]
Current RSS: 2.5 GB, Peak RSS: 6.1 GB
Joining output blocks... Loading dictionary...  [0.257s]
Joining output blocks... Segmentation fault (core dumped)

The command I used was:

./diamond blastx -d /data/clustered_nr/clustered_nr.dmnd -q meta_scaffolds.fasta -o meta_scaffolds.tax102.tsv --outfmt 102 --include-lineage --log --verbose

Database info:

$ du -sh /data/clustered_nr/clustered_nr.dmnd
111G	/data/clustered_nr/clustered_nr.dmnd

$ ./diamond dbinfo --db /data/clustered_nr/clustered_nr.dmnd
diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

          Database type  Diamond database
Database format version  3
          Diamond build  163
              Sequences  239387197
                Letters  88191052876

System info:

OS: Pop!_OS 22.04 LTS x86_64 
Host: Precision 5820 Tower X-Series 
Kernel: 6.0.2-76060002-generic 
Shell: bash 5.1.16 
CPU: Intel i9-10900X (20) @ 4.500GHz 
GPU: NVIDIA Quadro RTX 4000 
Memory: 10502MiB / 64021MiB 

diamond version: v2.1.9.163, precompiled binary downloaded from the Github release.

Do you have any suggestions to circumvent this issue? If you need a copy of the query file I will provide it pending permission from the higher ups.

Thanks!

edit: it seems to be related to the --outfmt 102 in this case. I re-ran the analysis omitting that flag and it ran to completion with no errors.

@bbuchfink
Copy link
Owner

Looks like the --include-lineage parameter is causing this, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants