Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precomputed pdb lookup and sequence don't line up #258

Open
tn-7 opened this issue Mar 26, 2024 · 1 comment
Open

precomputed pdb lookup and sequence don't line up #258

tn-7 opened this issue Mar 26, 2024 · 1 comment

Comments

@tn-7
Copy link

tn-7 commented Mar 26, 2024

executed: foldseek databases PDB pdb tmp

The first line of the pdb file after is: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMIQQKRWDEWAVNMAKSRWYNQTPNRAKRVITTFRTGTWDAYK

however this doesn't correspond to the first line of the pdb.lookup which is 200l_A.

instead the first line through blast shows it belongs to 145l_A which is on this line of the lookup:
grep -i -a -n 145L_A pdb.lookup
159:158 145l_A 121

how is the ordering done so that the id's match?

@milot-mirdita
Copy link
Member

The database entries are not stored in order. They are stored in our internal MMseqs2 database format:
https://github.com/soedinglab/MMseqs2/wiki#mmseqs2-database-format

The lookup file points to a database key (first column of the .lookup file), which points to the .index (again first column).
In the index you can lookup the byte offset (second column) that points to the data file.

The data file is a special issue for the PDB, since we ship it as a clustered database. The full PDB data is split across two seperate files pdb_seq.0 and pdb_seq.1, the former contains only the cluster representatives and the latter all others.

I would recommend to do database manipulations with the various Foldseek/MMseqs2 commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants