Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can MetaEuk utilize the MMSEQS2 clustering results? #78

Open
jolespin opened this issue May 12, 2023 · 2 comments
Open

[Question] Can MetaEuk utilize the MMSEQS2 clustering results? #78

jolespin opened this issue May 12, 2023 · 2 comments

Comments

@jolespin
Copy link

I have a pretty large database: https://zenodo.org/record/7485114#.ZF7RdOzML0o that I use in the backend of VEBA and I'm trying to decrease the resource needs.

I'm wondering if MetaEuk can handle the clustering results of MMEQS2 easy-cluster or easy-lincluster? If not, let's say one used clustered representatives as the database for finding exons. If you were to do this, what minimum coverage and percent identity would you use in MMEQS2 to capture (most of) the exons?

@elileka
Copy link
Member

elileka commented May 16, 2023

Hi,

Neat project!

On the reference side, MetaEuk can use protein profiles, so you could cluster the proteins (using linclust) and compute profiles (using result2profile) from each cluster. You could of course, also use cluster representatives, as you suggest.

How to choose the clustering parameters is a good question. I would start by setting the value of --cov-mode to either 1 or 3 and -c to, say 0.8. See here. I guess it is worth it to test on a subsample of your DB.

@milot-mirdita, any wise words about clustering and profiles using MMseqs2?

@jolespin
Copy link
Author

Thanks! I'm loving the MMSEQS2 and MetaEuk ecosystem. I'll look into results2profile in a bit.

What I was thinking may or may not be possible but it would be cool if MetaEuk could take in a clustered database the has cluster mappings and the full sequence set. Once it identifies a hit in the cluster representative, it could search for more exons in the proteins within a cluster so essentially it performs MetaEuk twice once on a large coarse level and then on a smaller subset of proteins with higher granularity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants