Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method> #369

Open
Jigyasa3 opened this issue Feb 20, 2024 · 0 comments

Comments

@Jigyasa3
Copy link

Jigyasa3 commented Feb 20, 2024

Hey @milot-mirdita ,

Thanks for a great resource to find remote homologs!
I am interested in finding an E.coli gene in other Proteobacteria. The literature shows that this gene is conserved in closely related strains only, so I am using HH-SUITE to find remote homologs of this gene in other Proteobacteria samples.

I would like to get some advice on the way I am using the HH-SUITE makes sense, and if the output is not a false positive/negative.

  1. I run hhblits to get all sequences similar to the E.coli gene of interest in the Uniclust30 cluster
    hhblits -cpu 4 -i ${IN_DIR}/ytfI_ecoli.fasta -d ${DB2}/UniRef30_2023_02 -oa3m ${OUT_DIR}/ytfI_ECOLI_uniclust.a3m -all

#The idea behind the step1 is to get remote homologs for the E.coli gene of interest as HMMsearch against a single E.coli gene as the database doesn't give any results!

  1. The resulting .a3m file was converted back to fasta file using reformat.pl script.
  2. The hmmbuild command was used to convert the MSA into a database.
  3. I use hmmsearch on the Proteobacteria protein sequences against the database from step 3.

Unfortunately, this is not giving a hit that is "significant" enough i.e. the E.value of the hit was not less than 1e-3.

I am comparing the Proteobacteria sequences with the E.coli gene of interest using Foldseek's easy_search command too. And, I find no "significant" hit i.e. the E.value of the hit was not less than 1e-3.

So I am interested in understanding what could be considered a reasonable remote homolog of the gene, and if the two methods I am using make sense.

Regards,
Jigyasa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant