Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(core dumped) "$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR} #270

Open
dabianzhixing opened this issue May 11, 2024 · 18 comments

Comments

@dabianzhixing
Copy link

Hello,
I'm using easy-complexsearch to find similar structure oligomers against a DB. However, I find some searches success but some searches failed. For the failed searches, the error is

"tmpFolder/12745254066866134041/easycomplexsearch.sh: line 52: 28200 Segmentation fault (core dumped) "$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR}"

my command is like

"foldseek easy-complexsearch --alignment-type 1 7yw0_1.pdb chainDB 7yw0_1.txt tmpFolder --format-output "query,target"

I don't know what happened for the error. It seems like mmseq2 failed to filter the results?

@martin-steinegger
Copy link
Collaborator

@dabianzhixing what commit do you use? Whats in the chainDB?

@milot-mirdita
Copy link
Member

Please try the latest release 9. there have been a lot of changes in preparation for the preprint, your issue might have been fixed already.

@dabianzhixing
Copy link
Author

@martin-steinegger The foldseek version is 427df8a. I downloaded it from the github last Friday. chainDB is a precomputed database. It has been used for monomer search in previous version and it worked well.

@milot-mirdita
Copy link
Member

milot-mirdita commented May 13, 2024

Could you please check again that you are actually running the binary for 427df8a?

From your error message, it looks like you are using an older binary. easycomplexsearch.sh was renamed to easymultimersearch.sh. So the former string shouldn't appear in error messages anymore.

Also please upload the full terminal output of Foldseek.

Depending on when you created your chainDB, I would also recommend to try recreating it. If this was created before the Foldseek-MM work, it's .lookup file wouldn't have the correct format for FS-MM to work.

@dabianzhixing
Copy link
Author

@milot-mirdita could you tell me how to use the release 9? I'm trying to reconstruct my DB.

@milot-mirdita
Copy link
Member

You can download it here:
https://github.com/steineggerlab/foldseek/releases/tag/9-427df8a

or from bioconda.

@dabianzhixing
Copy link
Author

@milot-mirdita @martin-steinegger I think I'm using the right version.

foldseek Version: 427df8a

But now, all the output files are empty. Thiere is no results.

The following is the output of the command line

foldseek easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt tmpFolder --format-output "query,target"
/hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt exists and will be overwritten
easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt tmpFolder --format-output query,target

MMseqs Version: GITDIR-NOTFOUND
Chain name mode 0
Write mapping file 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Threads 192
Verbosity 3
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace true
TMscore threshold 0
TMalign hit order 0
TMalign fast 1
Preload mode 0
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Substitution matrix aa:3di.out,nucl:3di.out
Alignment mode 0
Alignment mode 0
E-value threshold 10
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Compressed 0
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 4
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 1
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Exhaustive search mode false
Prefilter mode 0
Search iterations 1
Remove temporary files false
MPI runner
Force restart with latest tmp false
Cluster search 0
Minimum assigned chains percentage Threshold 0
Multimer E-value 10000
Complex report mode 1
Alignment format 0
Format alignment output query,target
Database output false

/hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt exists and will be overwritten
convertalis tmpFolder/13859234540439683774/query /home/lvqy/foldseek/chainDB tmpFolder/13859234540439683774/multimer_result /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt --sub-mat 'aa:3di.out,nucl:3di.out' --format-mode 0 --format-output query,target --translation-table 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 192 --compressed 0 -v 3 --exact-tmscore 0

[=================================================================] 2 0s 0ms
Time for merging to 7yao_1.txt: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 108ms
/hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt_report exists and will be overwritten
createmultimerreport tmpFolder/13859234540439683774/query /home/lvqy/foldseek/chainDB tmpFolder/13859234540439683774/multimer_result /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt_report --db-output 0 --threads 192 -v 3

[=================================================================] 1 0s 0ms
Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 60ms

@milot-mirdita
Copy link
Member

Please completely delete the tmpFolder and run again. The output you posted is incomplete since it's reusing results from the previous run.

@dabianzhixing
Copy link
Author

@milot-mirdita I have tried several times. Delete all the files include the tmpFolder. But it doesn't work. I perform 500 queries and none of them have any results. I have also reconstruct the DB. I don't know what should I do.

@milot-mirdita
Copy link
Member

Sorry, I meant for you to please rerun it with an empty temp folder so we would have an easier time to diagnose the issue. This was not meant to fix the issue.

Please rerun and post the terminal output here.

@dabianzhixing
Copy link
Author

Create directory tmpFolder
easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB 7yao_1.txt tmpFolder --format-output query,target

MMseqs Version: 427df8a
Chain name mode 0
Write mapping file 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Threads 192
Verbosity 3
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace true
TMscore threshold 0
TMalign hit order 0
TMalign fast 1
Preload mode 0
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Substitution matrix aa:3di.out,nucl:3di.out
Alignment mode 0
Alignment mode 0
E-value threshold 10
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Compressed 0
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 4
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 1
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Exhaustive search mode false
Prefilter mode 0
Search iterations 1
Remove temporary files false
MPI runner
Force restart with latest tmp false
Cluster search 0
Minimum assigned chains percentage Threshold 0
Multimer E-value 10000
Complex report mode 1
Alignment format 0
Format alignment output query,target
Database output false

createdb /hdd_data/lvqy/rec/7yao_1.pdb tmpFolder/7613150203902551404/query --chain-name-mode 0 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 192 -v 3

Output file: tmpFolder/7613150203902551404/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 6ms
Time for merging to query_h: 0h 0m 0s 6ms
Time for merging to query_ca: 0h 0m 0s 5ms
Time for merging to query: 0h 0m 0s 5ms
Ignore 2 out of 4.
Too short: 2, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 0s 115ms
Create directory tmpFolder/7613150203902551404/multimersearch_tmp
multimersearch tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result tmpFolder/7613150203902551404/multimersearch_tmp -a 1

Create directory tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp
search tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp -a 0

prefilter tmpFolder/7613150203902551404/query_ss /home/lvqy/foldseek/chainDB_ss tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref --sub-mat 'aa:3di.out,nucl:3di.out' --seed-sub-mat 'aa:3di.out,nucl:3di.out' -s 9.5 -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 1000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-prob 0.99995 --mask-lower-case 1 --min-ungapped-score 30 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 192 --compressed 0 -v 3

Query database size: 2 type: Aminoacid
Estimated memory consumption: 8G
Target database size: 730735 type: Aminoacid
Index table k-mer threshold: 78 at k-mer size 6
Index table: counting k-mers
[=================================================================] 100.00% 730.73K 0s 637ms
Index table: Masked residues: 1640
Index table: fill
[=================================================================] 100.00% 730.73K 0s 787ms
Index statistics
Entries: 177066430
DB size: 1501 MB
Avg k-mer size: 2.766663
Top 10 k-mers
LVLVVV 190029
VVLVVV 178847
SVSVVV 162380
VVSVVV 155087
SVVVVV 131915
VVNVVV 73457
DPVVVV 69750
CVVVVV 62709
LVSVVV 57314
VLVVVV 53947
Time for index table init: 0h 0m 3s 41ms
Process prefiltering step 1 of 1

k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 730735
[=================================================================] 100.00% 2 0s 3ms

15866.797600 k-mers per position
50433469 DB matches per sequence
2 overflows
1000 sequences passed prefiltering per query sequence
1000 median result list length
0 sequences with 0 size result lists
Time for merging to pref: 0h 0m 0s 0ms
Time for processing: 0h 0m 5s 443ms
structurealign tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/strualn --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 10 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 0.5 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 192 --compressed 0 -v 3

[=================================================================] 100.00% 2 8s 653ms
Time for merging to strualn: 0h 0m 0s 9ms
Time for processing: 0h 0m 14s 327ms
mvdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/strualn tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/aln

Time for processing: 0h 0m 0s 7ms
mvdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/aln tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result -v 3

Time for processing: 0h 0m 0s 5ms
Removing temporary files
rmdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref -v 3

Time for processing: 0h 0m 0s 0ms
expandmultimer tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_pref --threads 192 -v 3

[=================================================================] 100.00% 1 eta -
Time for merging to result_expand_pref: 0h 0m 0s 81ms
Time for processing: 0h 0m 1s 309ms
structurealign tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_pref tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_aligned --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 1 --alignment-mode 0 --alignment-output-mode 0 --wrapped-scoring 0 -e 10000 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 192 --compressed 0 -v 3

[=================================================================] 100.00% 2 12s 909ms
Time for merging to result_expand_aligned: 0h 0m 0s 5ms
Time for processing: 0h 0m 17s 431ms
scoremultimer tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_aligned tmpFolder/7613150203902551404/multimer_result --min-assigned-chains-ratio 0 --threads 192 -v 3

[=================================================================] 100.00% 1 eta -
Time for merging to multimer_result: 0h 0m 0s 55ms
Time for processing: 0h 0m 2s 270ms
convertalis tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result 7yao_1.txt --sub-mat 'aa:3di.out,nucl:3di.out' --format-mode 0 --format-output query,target --translation-table 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 192 --compressed 0 -v 3 --exact-tmscore 0

[=================================================================] 100.00% 2 0s 0ms
Time for merging to 7yao_1.txt: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 116ms
createmultimerreport tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result 7yao_1.txt_report --db-output 0 --threads 192 -v 3

[=================================================================] 100.00% 1 eta -
Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 56ms

I could see some results in the tmpFolder. But again. the output file is empty

@dabianzhixing
Copy link
Author

@milot-mirdita The new output is shown. It seems that the merging time is strange.

Time for merging to 7yao_1.txt: 0h 0m 0s 0ms
Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms

I have tried monomer search with easy-search. The results is correct.

@milot-mirdita
Copy link
Member

Yeah something is weird.

We will have to take a look. Does the same also happen with our prebuilt databases?

Is this the same 7yao cif file as stored in the PDB?

@dabianzhixing
Copy link
Author

dabianzhixing commented May 14, 2024 via email

@martin-steinegger
Copy link
Collaborator

@Woosub-Kim could you have a look please?

@milot-mirdita
Copy link
Member

Maybe one more thing for us to investigate:

Please post an excerpt of the chainDB.lookup:

head -n 50 chainDB.lookup

@dabianzhixing
Copy link
Author

@milot-mirdita head -n 50 chainDB.lookup
0 101m_A 0
1 102l_A 1
2 102m_A 2
3 103l_A 3
4 103m_A 4
5 104l_A 5
6 104l_B 6
7 104m_A 7
8 105m_A 8
9 106m_A 9
10 107l_A 10
11 107m_A 11
12 108l_A 12
13 108m_A 13
14 109l_A 14
15 109m_A 15
16 10gs_A 16
17 10gs_B 17
18 10mh_C 18
19 110l_A 19
20 110m_A 20
21 111l_A 21
22 111m_A 22
23 112l_A 23
24 112m_A 24
25 113l_A 25
26 114l_A 26
27 115l_A 27
28 117e_A 28
29 117e_B 29
30 118l_A 30
31 119l_A 31
32 11as_A 32
33 11as_B 33
34 11ba_A 34
35 11ba_B 35
36 11bg_A 36
37 11bg_B 37
38 11gs_A 38
39 11gs_B 39
40 120l_A 40
41 121p_A 41
42 122l_A 42
43 123l_A 43
44 125l_A 44
45 126l_A 45
46 127l_A 46
47 128l_A 47
48 129l_A 48
49 12as_A 49

I use the prebuild DB for monomer retrieval and it works well.

@dabianzhixing
Copy link
Author

problem solved.
My prebuild DB--chainDB is originally constructed based on monomers. It could not be used for oligomer retrieval.
I build a new oligomer DB. Now the result is correct. Thank you very much! @milot-mirdita @martin-steinegger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants