Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selenoproteins? #43

Open
mallen6 opened this issue Sep 15, 2023 · 2 comments
Open

selenoproteins? #43

mallen6 opened this issue Sep 15, 2023 · 2 comments
Labels
question Further information is requested

Comments

@mallen6
Copy link

mallen6 commented Sep 15, 2023

Hello,
I saw that pyrodigal can use different translation tables (issue #34), which is great!

Do you know if pyrodigal can correctly identify selenocysteine insertions? (And pyrrolysine too)

I have multiple MAGs containing tRNA-Sec and the SelA protein, but prokka/prodigal gene prediction of these MAGs is giving a lot of partial ORFs. I would love to be able to annotate the ORFs of these MAGs with a tool that can output the selenoprotein ORFs correctly (requires recognising the UGA stop codon AND also the bacterial SEC insertion sequence (SECIS), which can be a bit divergent among different taxa/genes).

Any help or suggestions would be great :)

@oschwengers
Copy link

Would also love to see this on the gene prediction side. However, I can image, this is a non-trivial task.

A bit of #shameless-plug since I'm the main developer but FYI:
We have implemented such a feature for selenocysteine proteins in Bakta. Bakta detects cis-regulatory recoding stimulation ncRNA regions. And if two adjacent, proximate, in-frame CDS are alo detected, it is able to merge the ORFs of both and recodes the stop codon of the upstram ORF to a selenocystein codon. Thus it is able to predict and annotate such proteins - also for MAGs.

@althonos althonos added the question Further information is requested label Sep 27, 2023
@althonos
Copy link
Owner

Hi @mallen6 !

At the moment Prodigal (and thus Pyrodigal) doesn't support selenoproteins at all. It would require some extra work to recognize and score the SECIS that would probably warrant some fundamental changes in the Prodigal node scoring algorithm :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants