Skip to content
Josué Barrera Redondo edited this page Sep 11, 2023 · 5 revisions

Sensitive inference of gene age using GenEra

Welcome to the GenEra wiki! Here we detail how to install, run and explore the additional options with GenEra for sensitive inference of gene age.

GenEra

Introduction

GenEra is an easy-to-use and highly customizable command-line tool that estimates gene-family founder events (i.e., the age of the last common ancestor of protein-coding gene families) through the reimplementation of genomic phylostratigraphy (Domazet-Lošo et al., 2007). GenEra takes advantage of DIAMOND’s speed and sensitivity to search for homolog genes throughout the entire NR database, and combines these results with the NCBI Taxonomy to assign an origination date for each gene and gene family in a query species. GenEra can also incorporate protein data from external sources to enrich the analysis, it can search for proteins within nucleotide data (i.e., genome/transcriptome assemblies) using MMseqs2 to improve the classification of orphan genes, and it calculates a taxonomic representativeness score to assess the reliability of assigning a gene to a specific age. Additionally, GenEra can calculate homology detection failure probabilities using abSENSE to help distinguish fast-evolving genes from high-confidence gene-family founder events.

  • As of v1.3.0, users can now detect gene ages on taxonomic levels below species, such as between different strains or subspecies that do not have a Taxonomy ID on the NCBI.
  • As of v1.2.0, GenEra was adapted to run completely offline!
  • As of v1.1.0, users can now use Foldseek to search protein structural predictions against the AlphaFold DB for fast and sensitive structural alignments. Alternatively, the user can choose to perform a reassessment of gene ages by running JackHMMER on top of DIAMOND (be aware, that this additional step significantly slows down the analysis).

Dependencies

GenEra requires the following software dependencies:

Additionally, GenEra requires access to the taxonomy dump from the NCBI and either a locally installed NR database for DIAMOND or a locally installed AlphaFold database for Foldseek.

GenEra publication

GenEra has now been published.

Barrera-Redondo, J., Lotharukpong, J.S., Drost, H.G., Coelho, S.M. (2023). Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biology, 24, 54. https://doi.org/10.1186/s13059-023-02895-z

GenEra makes use of several dependencies that should also be cited, if implemented within the pipeline. Please see the Citations page.

GenEra has been cited in

A highly contiguous genome assembly reveals sources of genomic novelty in the symbiotic fungus Rhizophagus irregularis
Bethan F Manley, Jaruwatana S Lotharukpong, Josué Barrera-Redondo, Theo Llewellyn, Gokalp Yildirir, Jana Sperschneider, Nicolas Corradi, Uta Paszkowski, Eric A Miska, Alexandra Dallaire
G3 Genes|Genomes|Genetics 2023, Volume 13, Issue 6, jkad077;
doi: https://doi.org/10.1093/g3journal/jkad077

pLM-BLAST – distant homology detection based on direct comparison of sequence representations from protein language models
Kamil Kaminski, Jan Ludwiczak, Vikram Alva, Stanislaw Dunin-Horkawicz
bioRxiv 2022.11.24.517862;
doi: https://doi.org/10.1101/2022.11.24.517862

Single-cell atlases of two lophotrochozoan larvae highlight their complex evolutionary histories
Laura Piovani, Daniel J. Leite, Luis Alfonso Yañez Guerra, Fraser Simpson, Jacob M. Musser, Irepan Salvador-Martínez, Ferdinand Marlétaz, Gáspár Jékely, Maximilian J. Telford
Science Advances 2023, Volume 9, Issue 31, eadg6034;
doi: https://doi.org/10.1126/sciadv.adg6034

Genome evolution in plants and the origins of innovation
James W. Clark
New Phytologist 2023, early view;
doi: https://doi.org/10.1111/nph.19242