Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predicted proteins not starting with M ? #43

Open
Tkastylevsky opened this issue Mar 15, 2022 · 2 comments
Open

predicted proteins not starting with M ? #43

Tkastylevsky opened this issue Mar 15, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@Tkastylevsky
Copy link

Hello !
I am using Metaeuk through BUSCO on my genome in order to do some gene prediction.

I expect that the single-copy, full-length detected proteins should have in their vast majority a methyonine at their start. However, this is not the case.

I went through the predicted BUSCOs and several of them started by another aminoacid.
Is there a step in metaeuk that checks for the starting aminoacid ?
I am using the glires database from ODB on a yet unannotated genome, with metaeuk Version 5.34c21f2
I installed BUSCO (so Metaeuk as well) using their conda installation (BUSCO V 5.3) on an Ubuntu operating system.
Best,
Timothee

@elileka
Copy link
Member

elileka commented Mar 29, 2022

Hi Timothee,

Thank you for the comment. I am marking this as a future feature to develop. Right now it is not possible to impose that proteins start with a methionine. There can be several reasons why several of your proteins do not start with M: (1) some proteins simply don't, (2) It can be your contigs are very fragmented so you get a lot of partial proteins; (3) It can be that your investigated organism is not very similar to that, which exists in the target database, in which case, the homology detection will be harder and some parts (potentially the start) of the proteins match poorly.

If this concerns you, I would try to look at a couple of things: (1) What is the fraction of proteins, which do not start with M? Does is it make sense for the taxonomic group you're investigating? How does this correlate with their E-value (do the missing M have worse E-values?) (2) Can you manually check a couple of examples? Does it look like there is an M upstream, which was not detected?

@elileka elileka added the enhancement New feature or request label Mar 29, 2022
@elileka
Copy link
Member

elileka commented May 1, 2024

Hi Timothee,

It's been two years... not sure if this is still relevant for you...

I have added the option to scan for an ATG before the first exon. It is still not part of an official MetaEuk release but the option is there from commit 528cddc.
See details here.

Best,
Eli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants