Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

hrp1000 · 2023-12-07T10:02:05Z

I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.

Expected Behavior - indels should align

Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.

Steps to Reproduce (for bugs)

Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.

command to run:

/bmm/soft/linux64/src/hh-suite-bin/bin/hhblits -n 1 -i /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhm -d /bmm/www/servers/phyre2/test/hmm/full -o /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhr -b 100 -norealign -z 500 -alt 1 -aliw 60

HH-suite Output (for bugs)

see attached file, but the interesting bit is here - note the indel for c5volC_ (target) appears around residues 168-174, but in the query (c5volA_) appears around 196-202

Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT
Q ss_pred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC
Q c5volA_ 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260)
Q Consensus 121 ~~~~~~~~~~gsg~~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180 (260)
..+..++.+.|.|.|+..+...+...+..+..++..++......................
T Consensus 123 ~~~~~~~~~~GSGg~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 (268)
T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268)
T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH
T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccch

Q ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
Q ss_pred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE
Q c5volA_ 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260)
Q Consensus 181 ~~~~~~~~~~~~~~~-------~~~~~~~~~~~~~~~~~~~~~~~~~~~L~~~g~~~~~~ 233 (260)
............... ....+++++.+++.|....++++++++|++.|+++++.
T Consensus 183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~g~~D~~~~~~~~~~~~l~~~g~~~~~~ 242 (268)
T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268)
T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEE

Context

The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?

Your Environment

Version/Git commit used: last publicly released version
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM
Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)

c5volA_.hhblits.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

hrp1000 commented Dec 7, 2023

Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

Both HHBlits and HHSearch give misaligned indels for homologous sequences #363

Comments

hrp1000 commented Dec 7, 2023

Expected Behavior - indels should align

Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.

Steps to Reproduce (for bugs)

HH-suite Output (for bugs)

Context

Your Environment