Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHsearch - inconsistent results #355

Open
majnusova opened this issue Aug 7, 2023 · 1 comment
Open

HHsearch - inconsistent results #355

majnusova opened this issue Aug 7, 2023 · 1 comment

Comments

@majnusova
Copy link

Hi,
I'm trying to use hhsearch with a customized database. Both query and database alignments represent an entire family of homologs (there is no seed sequence). Hence, I reckon that -M 50 parameter should be applied to both query and database HMMs inferred from these alignments.

I tried to apply the -M 50 parameter and then, to test if it works, I changed the order of sequences in the alignments that were used to build the query HMMs, ran hhsearch, and it seemed to be OK (the results were basically the same when compared to the results inferred from the original alignments). But then, I changed the order of sequences in the alignments that were used to build the customized database and I got different results - it seems that the -M 50 parameter does not work...

Could you please check the bash script I've been using and help me to use the tool correctly?
Thanks a lot!

#!/bin/sh

cd /home/users/mvladka/miniconda3/envs/hhsuite/msa

mamba activate hhsuite

ffindex_build -s ../mydatabase.ff{data,index} .
cd ..

ffindex_apply mydatabase.ff{data,index} -i mydatabase_a3m_wo_ss.ffindex -d mydatabase_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0

rm mydatabase.ff{data,index}

mv mydatabase_a3m_wo_ss.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_wo_ss.ffdata mydatabase_a3m.ffdata

ffindex_apply mydatabase_a3m.ff{data,index} -i mydatabase_hhm.ffindex -d mydatabase_hhm.ffdata -- hhmake -i stdin -o stdout -v 0

cstranslate -f -x 0.3 -c 4 -I a3m -i mydatabase_a3m -o mydatabase_cs219 

sort -k3 -n -r mydatabase_cs219.ffindex | cut -f1 > sorting.dat
    
ffindex_order sorting.dat mydatabase_hhm.ff{data,index} mydatabase_hhm_ordered.ff{data,index}
mv mydatabase_hhm_ordered.ffindex mydatabase_hhm.ffindex
mv mydatabase_hhm_ordered.ffdata mydatabase_hhm.ffdata
    
ffindex_order sorting.dat mydatabase_a3m.ff{data,index} mydatabase_a3m_ordered.ff{data,index}
mv mydatabase_a3m_ordered.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_ordered.ffdata mydatabase_a3m.ffdata


for query in $(ls *.fasta); do perl /home/users/mvladka/miniconda3/envs/hhsuite/scripts/reformat.pl fas a3m $query $(basename $query .fasta).a3m -M 50 &> $(basename $query .fasta).reformatlog; done

for a3m_query in $(ls *.a3m); do hhmake -i $a3m_query -M a3m -add_cons &> $(basename a3m_query .a3m).hhmakelog; done
for hhm in $(ls *.hhm); do hhsearch -i $hhm -d mydatabase -M a3m &> $(basename $hhm .hhm).hhsearchlog; done
@altaetran
Copy link

Could it be because the HHsuite databases may only use the first 100 sequences (or some other number) in the alignment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants