Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new Greengenes2 database for classification #658

Open
aimirza opened this issue Nov 9, 2023 · 2 comments
Open

Adding the new Greengenes2 database for classification #658

aimirza opened this issue Nov 9, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@aimirza
Copy link

aimirza commented Nov 9, 2023

Description of feature

Greengenes2 recently came out. Greengenes2 is a new release of the Greengenes database that has been redesigned from the ground up and backed by whole genomes, focusing on harmonizing 16S rRNA and shotgun metagenomic datasets. It is also
much larger than past resources in its phylogenetic coverage, as compared to SILVA, Greengenes and GTDB. It would be great to add this database as an optional feature for classifying sequences. Usage instructions are below. It has a QIIME 2 plugin. Notice that the approaches to classify sequences is different between V4 and non-V4 sequences.

Paper: https://www.nature.com/articles/s41587-023-01845-1
How to use it: https://forum.qiime2.org/t/introducing-greengenes2-2022-10/25291

@aimirza aimirza added the enhancement New feature or request label Nov 9, 2023
@d4straub
Copy link
Collaborator

Hi there,
yes that is an interesting database indeed. I dislike however that its very much centered on QIIME2 and the V4-region. GTDB also allows harmonizing between 16S and shotgun metagenomics and that is available in ampliseq & mag already.

Greengenes2 was discussed in https://nfcore.slack.com/archives/CEA7TBJGJ/p1690539708378009 & https://nfcore.slack.com/archives/CEA7TBJGJ/p1678204777328909. Using --skip_dada_taxonomy --classifier http://ftp.microbio.me/greengenes_release/current/2022.10.backbone.full-length.nb.qza might do the job (not tested!). Feedback would be appreciated.
Otherwise preprocessing the database with QIIME2 v2023.7 (that is used in ampliseq v2.7.0) and providing the classifier to the pipeline with --classifier should work currently.

I hope for the integration of Greengenes2 for DADA2 classifications, that should solve all preprocessing and make the db integration relatively easy to add here, including an upload to Zenodo which is much preferred to a university DB. Greengenes2 was said to be "soon-ish" provided as DADA2 database in Zenodo, see benjjneb/dada2#1680 and benjjneb/dada2#1829.

@d4straub
Copy link
Collaborator

Greengenes2 support is now for QIIME2 available in the dev branch and will be in the next release. I dont close that issue though because there is still no news for DADA2 (or I missed it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants