Adding the new Greengenes2 database for classification #658

aimirza · 2023-11-09T17:19:18Z

Description of feature

Greengenes2 recently came out. Greengenes2 is a new release of the Greengenes database that has been redesigned from the ground up and backed by whole genomes, focusing on harmonizing 16S rRNA and shotgun metagenomic datasets. It is also
much larger than past resources in its phylogenetic coverage, as compared to SILVA, Greengenes and GTDB. It would be great to add this database as an optional feature for classifying sequences. Usage instructions are below. It has a QIIME 2 plugin. Notice that the approaches to classify sequences is different between V4 and non-V4 sequences.

Paper: https://www.nature.com/articles/s41587-023-01845-1
How to use it: https://forum.qiime2.org/t/introducing-greengenes2-2022-10/25291

d4straub · 2023-11-10T07:25:44Z

Hi there,
yes that is an interesting database indeed. I dislike however that its very much centered on QIIME2 and the V4-region. GTDB also allows harmonizing between 16S and shotgun metagenomics and that is available in ampliseq & mag already.

Greengenes2 was discussed in https://nfcore.slack.com/archives/CEA7TBJGJ/p1690539708378009 & https://nfcore.slack.com/archives/CEA7TBJGJ/p1678204777328909. Using --skip_dada_taxonomy --classifier http://ftp.microbio.me/greengenes_release/current/2022.10.backbone.full-length.nb.qza might do the job (not tested!). Feedback would be appreciated.
Otherwise preprocessing the database with QIIME2 v2023.7 (that is used in ampliseq v2.7.0) and providing the classifier to the pipeline with --classifier should work currently.

I hope for the integration of Greengenes2 for DADA2 classifications, that should solve all preprocessing and make the db integration relatively easy to add here, including an upload to Zenodo which is much preferred to a university DB. Greengenes2 was said to be "soon-ish" provided as DADA2 database in Zenodo, see benjjneb/dada2#1680 and benjjneb/dada2#1829.

d4straub · 2024-01-12T08:04:45Z

Greengenes2 support is now for QIIME2 available in the dev branch and will be in the next release. I dont close that issue though because there is still no news for DADA2 (or I missed it).

aimirza added the enhancement New feature or request label Nov 9, 2023

This was referenced Nov 29, 2023

Add greengenes2 2022.10 support to Ampliseq #664

Closed

Greengenes2 2022.10 Support #666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the new Greengenes2 database for classification #658

Adding the new Greengenes2 database for classification #658

aimirza commented Nov 9, 2023

d4straub commented Nov 10, 2023

d4straub commented Jan 12, 2024

Adding the new Greengenes2 database for classification #658

Adding the new Greengenes2 database for classification #658

Comments

aimirza commented Nov 9, 2023

Description of feature

d4straub commented Nov 10, 2023

d4straub commented Jan 12, 2024