Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macromolecular structure and integron annotation, custom features #117

Open
alexweisberg opened this issue Jul 28, 2022 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@alexweisberg
Copy link
Contributor

Is your feature request related to an existing issue or bug?
no

Is your new feature related to a general problem?
Typically annotation of large secretion systems (T3SS, T4SS, T6SS, etc) can be spotty and gene names differ a lot between organisms. Its also not clear if these are predicted to be complete structures. Likewise, its not always clear where integron systems are located and which genes are in cassettes.

Describe the solution you'd like
It would be really nice if Bakta could run macsyfinder (https://github.com/gem-pasteur/macsyfinder) with the TXSScan models (https://github.com/macsy-models/TXSScan) and incorporate that into the annotation. Running IntegronFinder (https://github.com/gem-pasteur/Integron_Finder) and marking cassette borders in the annotations would be helpful too. Similarly, IS element boundaries could be marked with ISEScan (https://github.com/xiezhq/ISEScan) or prophage regions with DBSCAN-SWA (https://github.com/HIT-ImmunologyLab/DBSCAN-SWA/)

Describe alternatives you've considered
I totally understand that not every analysis tool or pipeline could or even should be added to bakta. Alternatively if adding these as options is too time consuming or complex, some way to run them separately and then use a bakta script to update the annotations with this information would be great. This would also be nice for custom annotations of features from an input table file, like integrated mobile element repeats, Agrobacterium T-DNA borders, etc.

Thanks!

@alexweisberg alexweisberg added the enhancement New feature or request label Jul 28, 2022
@oschwengers
Copy link
Owner

Hi @alexweisberg , thanks a lot for reaching out with these thorough considerations. We've thought about this a lot and I'd love to enhance annotations of T?SS -and MGE-related proteins as well as adding annotations for these structures and MGEs themselves. Unfortunately, it's hard to decide where to stop in the workflow and which tools to integrate. There are tons of different analyses people are conducting and all of them could improve the overall annotation of a genome. However, This is of course not feasible both in terms of the effort this would require and the increased runtime this would induce.

Therefore, I currently tend to leave these dedicated analyses out of the Bakta workflow. I'll take a deeper look at all the related HMM & covariance models, which might be useful for the pre-computed annotations (db creation). Although from my experience these models are a great resource for the detection of such features but not necessarily for the annotation of the proteins. I'm sorry that I cannot be of more help here. I will keep this open, so others can add their thoughts on this.

Thanks again!

@alexweisberg
Copy link
Contributor Author

Hi Oliver, Thank you for getting back to me on this, I appreciate the thorough reply. I agree that it may be best to leave these as extra analyses outside of Bakta. Many of these programs change quickly and what might be "best" differs over time.

For the hmm model I was thinking that would be more useful for annotating specific DNA sites like dif sites, promoters, T-DNA borders, etc rather than protein domains. I recently have been working on scripts for updating gbk/gff files with extra elements or annotations so I think leaving these out of Bakta for now is fine. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants