Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

Open
hrp1000 opened this issue Dec 7, 2023 · 0 comments
Open

Comments

@hrp1000
Copy link

hrp1000 commented Dec 7, 2023

I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.

Expected Behavior - indels should align

Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.

Steps to Reproduce (for bugs)

Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.

command to run:

/bmm/soft/linux64/src/hh-suite-bin/bin/hhblits -n 1 -i /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhm -d /bmm/www/servers/phyre2/test/hmm/full -o /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhr -b 100 -norealign -z 500 -alt 1 -aliw 60

HH-suite Output (for bugs)

see attached file, but the interesting bit is here - note the indel for c5volC_ (target) appears around residues 168-174, but in the query (c5volA_) appears around 196-202

Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT
Q ss_pred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC
Q c5volA_ 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260)
Q Consensus 121 ~~~~~~~~~~gsg~~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180 (260)
..+..++.+.|.|.|+..+...+...+..+..++..++......................
T Consensus 123 ~~~~~~~~~~GSGg~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 (268)
T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268)
T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH
T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccch

Q ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
Q ss_pred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE
Q c5volA_ 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260)
Q Consensus 181 ~~~~~~~~~~~~~~~-------~~~~~~~~~~~~~~~~~~~~~~~~~~~L~~~g~~~~~~ 233 (260)
............... ....+++++.+++.|....++++++++|++.|+++++.
T Consensus 183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~g~~D~~~~~~~~~~~~l~~~g~~~~~~ 242 (268)
T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268)
T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEE

Context

The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?

Your Environment

  • Version/Git commit used: last publicly released version

  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM

  • Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)

c5volA_.hhblits.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant