Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easy-search hangs on scop40 test #257

Open
rcedgar opened this issue Mar 24, 2024 · 9 comments
Open

easy-search hangs on scop40 test #257

rcedgar opened this issue Mar 24, 2024 · 9 comments

Comments

@rcedgar
Copy link

rcedgar commented Mar 24, 2024

I'm trying to implement the SCOP40 test using the latest foldseek. The creatdb command completes; the easy-search command runs for a while but then hangs indefinitely. Advice welcomed for how to implement this in the best way for measuring foldseek speed and accuracy, thanks for any help!

# foldseek Version: 915ef7ddce1bd77080208eff8a434c0985ae7492

foldseek createdb \
  ../scop40pdb/pdb \
  scop40

/bin/time -v -o foldseek.time \
foldseek easy-search \
  ../scop40pdb/pdb \
  scop40 /
  --format-output "query,target,pident,evalue,alntmscore" \
  hits.txt
@rcedgar
Copy link
Author

rcedgar commented Mar 25, 2024

Update -- I was able to work around the problem by removing alntmscore from the format-output option, I'm guessing computing the TM alignment is much slower than the S-W 3Di alignment and is not needed to calculate the E-value.

@12047019
Copy link

I want to know where you got the SCOP40 or 35 files to createdb? I have to do the SCOP against my bundles of protein structures but couldn't get the files to createdb.

@rcedgar
Copy link
Author

rcedgar commented Mar 26, 2024

@martin-steinegger
Copy link
Collaborator

I am not recommending to use this, it’s quite an old version. It make sense to use the latest for annotation or benchmarking https://scop.berkeley.edu/

@rcedgar
Copy link
Author

rcedgar commented Mar 26, 2024

Noted thanks, will do for anything written up but for preliminary work it's helpful that the expensive computes for DALI and TMalign are included in the downloads for the foldseek paper.

@12047019
Copy link

Thanks @rcedgar @martin-steinegger, got it. It would be so kind of you if you preassemble and add it like other databases in the foldseek @martin-steinegger

@rcedgar
Copy link
Author

rcedgar commented Apr 2, 2024

Hi @martin-steinegger with --format-output "query,target,evalue" foldseek completes SCOP40 quickly but the sensitivity is lower than reported in the paper. Presumably I need to tweak some options such as --max-seqs and --exhaustive-search but I don't see the command line in Methods or Supp Data, What are recommended options for comparative validation? Thanks!

@martin-steinegger
Copy link
Collaborator

We have all scripts for benchmarking here https://github.com/steineggerlab/foldseek-analysis

@rcedgar
Copy link
Author

rcedgar commented Apr 2, 2024

Much better! Seems accuracy is getting close to DALI now, is there any explanation of improvements in the algorithm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants