Skip to content

drostlab/published_phylomaps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 

Repository files navigation

Visitors

General Overview

Gene Age Inference is the foundation of most Evolutionary Transcriptomics studies. The concept behind Evolutionary Transcriptomics is to combine these gene age estimates with gene expression data to quantify the average transcriptome age within a biological process of interest (Drost et al., 2018, Bioinformatics).

In particular, this approach allowed the quantification of transcriptome conservation of animal and plant embryos passing through embryogenesis by first individually estimating the gene ages of specific animal and plant genomes and combining these gene age estimates with transcriptome data covering several stages of embryo development (Domazet-Loso and Tautz, 2010 Nature ; Quint, Drost et al., 2012 Nature ; Drost et al., 2015 Mol. Biol. Evol. ; Drost et al., 2016 Mol. Biol. Evol.).

However, as intensely discussed in the past years (Capra et al., 2013; Altenhoff et al., 2016; Liebeskind et al., 2016), gene age inference is not a trivial task and might be biased in some currently existing approaches (Liebeskind et al., 2016).

In particular, Moyers & Zhang argue that genomic phylostratigraphy (a prominent BLAST based gene age inference method) 1) underestimates gene age for a considerable fraction of genes, 2) is biased for rapidly evolving proteins which are short, and/or their most conserved block of sites is small, and 3) these biases create spurious nonuniform distributions of various gene properties among age groups, many of which cannot be predicted a priori (Moyers & Zhang, 2015; Moyers & Zhang, 2016; Liebeskind et al., 2016). However, these arguments were based on simulated data and were inconclusive due to errors in their analyses. Furthermore, Domazet-Loso et al., 2016 provide convincing evidence that there is no phylostratigraphic bias. In general, however, an objective benchmarking set representing the tree of life is still missing and therefore any procedure aiming to quantify gene ages will be biased to some degree.

Based on this debate Liebeskind et al., 2016 suggest to perform gene age inference by combining thirteen common orthology inference algorithms to create gene age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Using this approach systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. However, by generating a consensus gene age and quantifying the possible error in each workflow step, Liebeskind et al., 2016 provide a very useful database of consensus gene ages for a variety of genomes.

Alternatively, Stephen Smith, 2016 argues that de novo gene birth/death and gene family expansion/contraction studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods. For this purpose, I recommend researchers to consult the phylomedb database to retrieve phylogeny-based gene orthology relationships and use these age estimates in combination with myTAI.

In addition, Weisman et al., 2020 test and discuss the issue of homology detection failure, i.e., the inability of pairwise local aligners to trace back distantly related homologs only due to neutral sequence divergence which results in spurious patterns of TRG birth (as also discussed in Barrera-Redondo et al., 2023). Homology detection failure can cause especially small and fast-evolving genes to be wrongly annotated as young genes.

A recent publication by Barrera-Redondo et al., 2023 has sought to overcome this limitation and increase the speed and scalability of gene age inference using DIAMOND instead of blast for local pairwise sequence alignment. The effects of horizontal gene transfer and database contaminations are also mitigated with the taxonomic representativeness thresholds. Furthermore, Barrera-Redondo et al., 2023 have examined the effect of other alignment approaches such as protein structure alignment for gene age inference.

The myTAI package aims to provide a standard tool for Evolutionary Transcriptomics studies and relies on gene age estimate tables as input. Hence, I recommend to follow the active discussion on gene age inference and to consult all available resources to robustly estimate gene age (for ex. use the consensus gene age estimates provided by Liebeskind et al., 2016 and phylostratigraphic maps generated by phylostratigraphy to quantify transcriptome age with myTAI).

In case researchers would like to perform genomic phylostratigraphy, GenEra has recently been released and can be used to generate high quality phylostratigraphic maps. Instructions for using GenEra is provided here. Previous tools for genomic phylostratigraphy include ORFanFinder.

Evidently, these advancements in gene age research are very recent and gene age inference is a very young and active field of genomic research. Therefore, many more studies need to address the robust and realistic inference of gene age and a community standard is still missing.

To provide a comprehensive resource of gene ages, I accumulated all phylostratigraphic maps or sequence divergence maps that have been published to date and they might nevertheless be useful to study global patterns of transcriptome conservation in biological processes.

However, future Evolutionary Transcriptomics studies (this will include my own research) should consider these new advancements in gene age inference methods and concepts, because they will allow researchers to quantify transcriptome conservation or estimate the average transcriptome age with myTAI more accurately and more robustly.

If your study recently published a phylostratigraphic map or sequence divergence map, please contact me (https://github.com/HajkD/published_phylomaps/issues) and I will gladly reference you and include your study in the following list.

A Collection of Published Phylostratigraphic Maps and Divergence Maps

The following studies include published Phylostratigraphic Maps and Divergence Maps based on similar approaches, but using different parameter sets. This collection aims to store all published maps and furthermore, provides scripts for easy data retrieval to be able to integrate corresponding maps into a phylostranscriptomics workflow with myTAI.

The Introduction to Phylotranscriptomics Vignette introduces the integration of the following Phylostratigraphic Maps and Divergence Maps to perform custom phylotranscriptomic analyses.

Note: some of the phylostratigraphic maps are now retrievable via the R data package, phylomapr. More will be added in the near future.

Table of Contents

Please be aware that GeneIDs present in the respective phylomaps may not match the IDs of your corresponding gene expression dataset. If this is the case, you can try to convert IDs using biomartr and following this tutorial (please see also conversion example for unpublished Human phylomap).

Animals

Click here to see more animals

Plants

Click here to see more plants

Fungi

Click here to see more fungi

Phylomaps

Open here

Title: A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages

This study introduced Phylostratigraphy as computational method to quantify gene age in terms of sequence homology resulting in an organism specific Phylostratigraphic Map.

Published Phylostratigraphic Map:

  • Organisms: Drosophila melanogaster (fly)
  • E-value cutoff: 1E-3 (blastp; protein sequences) and 1E-15 (tblastn; ESTs)
  • Sequence type: Protein Sequences and ESTs
  • Reference data bases: NCBI nr (protein); trace and EST archives (EST)

Data sets are not available as Supplementary Tables.

Open here

Title: An Ancient Evolutionary Origin of Genes Associated with Human Genetic Diseases

Published Phylostratigraphic Map:

  • Organisms: Homo sapiens (human); ENSEMBL version 45
  • E-value cutoff: 1E-3 (blastp; protein sequences) and 1E-15 (tblastn; ESTs)
  • Sequence type: Protein Sequences and ESTs
  • Reference data bases: NCBI nr (protein); trace and EST archives (EST)
  • Splice variants: always using the longest splice variant

Download Map using R:

# download the Phylostratigraphic Map of Homo sapiens
# from Tomislav Domazet-Lošo and Diethard Tautz, 2008
download.file( url      = "https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mbe/25/12/10.1093_molbev_msn214/2/msn214_Supplementary_Data.zip?Expires=1690187066&Signature=UprdKTMgmqkhITKxIxbE74~blDUfhi2rgQn57X5d3YfFEM-bMqqC~LqMLJjGjlNkwOBrR2XGgOwWPyh4UEkXBdcpTbHiH71YFyZUeMJHhVBCLJQ7ceztyOJyrQC8sVScYyUZuPtOtgLgjdMrk5eP72P4R~pj-mPaIxx43efDP4VhjebsYjx8ICB~VEGZRMFAdfpKn8OZyLnMOuE37W3lRwpF9Nyr3~Tk7DUaIGmtzY6mv6mCmcO6bo2aq3pdXWkenDCo2rW-Mtdr8SN3fyClJZjFNHOMYFj4fgRjQoLUCSn-5JH59xqEvETalQTCdqV4y5h280AGrEcPp8CFfQwNhg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA",
               destfile = "MBE_2008_Homo_Sapiens_PhyloMap.zip")

utils::unzip( zipfile = "MBE_2008_Homo_Sapiens_PhyloMap.zip",
              files = "mbe-08-0522-File008_msn214.xls")

HomoSapiensPhyloMap.MBE <- read_excel("mbe-08-0522-File008_msn214.xls", sheet = 1, skip = 1)

    

Read the *.xls file storing the Phylostratigraphic Map of Homo sapiens and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
HomoSapiensPhyloMap.MBE <- read_excel("MBE_2008_Homo_Sapiens_PhyloMap.xls", sheet = 1, skip = 1)

# format Phylostratigraphic Map for use with myTAI
HomoSapiens.PhyloMap <- HomoSapiensPhyloMap.MBE[ , c("Phylostratum","Gene_ID")]

# have a look at the final format
head(HomoSapiens.PhyloMap)
  Phylostratum         Gene_ID
1            1 ENSG00000100053
2            1 ENSG00000100058
3            1 ENSG00000206066
4            1 ENSG00000100068
5            1 ENSG00000100077
6            1 ENSG00000133454

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Map of Homo sapiens from Domazet-Lošo and Tautz, 2008 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa

Published Phylostratigraphic Map:

  • Organisms: Homo sapiens (human); based on 20,259 unique proteins published here
  • E-value cutoff: 1E-3 (blastp; protein sequences) and 1E-15 (tblastn; ESTs)
  • Sequence type: Protein Sequences and ESTs
  • Reference data bases: NCBI nr (protein); trace and EST archives (EST)
  • Splice variants: always using the longest splice variant

Download Map using R:

# download the Phylostratigraphic Map of Homo sapiens
# from Domazet-Lošo and Tautz, 2010
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1186%2F1741-7007-8-66/MediaObjects/12915_2009_362_MOESM1_ESM.xls", 
               destfile = "BMCBiology_2010_Homo_Sapiens_PhyloMap.xls" )
    

Read the *.xls file storing the Phylostratigraphic Map of Homo sapiens and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
HomoSapiensPhyloMap.BMCBiology <- read_excel("BMCBiology_2010_Homo_Sapiens_PhyloMap.xls", sheet = 1, skip = 3, col_names = FALSE)

colnames(HomoSapiensPhyloMap.BMCBiology)[1:2] <- c("Gene_ID","Phylostratum")

# format Phylostratigraphic Map for use with myTAI
HomoSapiens.PhyloMap <- HomoSapiensPhyloMap.BMCBiology[ , c("Phylostratum","Gene_ID")]

# have a look at the final format
head(HomoSapiens.PhyloMap)
  Phylostratum      Gene_ID
1            1       15E1.2
2            1       3'HEXO
3            1 A0PJW7_HUMAN
4            1        A2BP1
5            1          A2M
6            1      A3GALT2

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Map of Homo sapiens from Domazet-Lošo and Tautz, 2010 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: A transcriptomic hourglass in plant embryogenesis

Published Phylostratigraphic Map:

  • Organisms: Arabidopsis thaliana
  • E-value cutoff: 1E-5 (blastp; protein sequences)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr + additional plant genomes (phytozome)
  • Splice variants: always using the longest splice variant

Published KaKs Maps:

  • Organisms: Arabidopsis thaliana versus Arabidopsis lyrata; Arabidopsis thaliana versus Brassica rapa; Arabidopsis thaliana versus Capsella rubella; Arabidopsis thaliana versus Thelungiella halophila
  • E-value cutoff: 1E-5 (blastp - best hit; protein sequences)
  • Sequence type: CDS + Protein Sequences

Download Phylostratigraphic Map using R:

# download the Phylostratigraphic Map of Arabidopsis thaliana
# from Quint et al., 2012
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1038%2Fnature11394/MediaObjects/41586_2012_BFnature11394_MOESM335_ESM.xls", 
               destfile = "Nature_2012_Arabidopsis_thaliana_PhyloMap.xls" )
    

Read the *.xls file storing the Phylostratigraphic Map of Arabidopsis thaliana and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
ArabidopsisThalianaPhyloMap.Nature <- read_excel("Nature_2012_Arabidopsis_thaliana_PhyloMap.xls", sheet = 1)

# format Phylostratigraphic Map of Arabidopsis thaliana for use with myTAI
ArabidopsisThaliana.PhyloMap <- ArabidopsisThalianaPhyloMap.Nature[ , 1:2]

# have a look at the final format
head(ArabidopsisThaliana.PhyloMap)
  Phylostratum      Gene
1           13 At5g15420
2           13 At2g07719
3           13 At3g43940
4           13 At5g45095
5           13 At5g60260
6           13 At1g54420

Download KaKs Maps using R:

# download the KaKs Maps of Arabidopsis thaliana
# from Quint et al., 2012
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1038%2Fnature11394/MediaObjects/41586_2012_BFnature11394_MOESM336_ESM.xls", 
               destfile = "Nature_2012_Arabidopsis_thaliana_KaKsMaps.xls" )

Arabidopsis thaliana versus Arabidopsis lyrata

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
Ath_vs_Aly_KaKsMap.Nature <- read_excel("Nature_2012_Arabidopsis_thaliana_KaKsMaps.xls", sheet = 1)

# format KaKs Map of Arabidopsis thaliana versus Arabidopsis lyrata for use with myTAI
Ath_vs_Aly.KaKsMap <- Ath_vs_Aly_KaKsMap.Nature[ , 1:2]

# have a look at the final format
head(Ath_vs_Aly.KaKsMap)
     KaKs      Gene
1 0.18560 At5g53110
2 0.58980 At2g02320
3 0.07699 At5g14980
4 0.60110 At2g26050
5 0.08910 At4g17650
6 0.23650 At5g20040

Arabidopsis thaliana versus Brassica rapa

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
Ath_vs_Bra_KaKsMap.Nature <- read_excel("Nature_2012_Arabidopsis_thaliana_KaKsMaps.xls", sheet = 2)

# format KaKs Map of Arabidopsis thaliana versus Brassica rapa for use with myTAI
Ath_vs_Bra.KaKsMap <- Ath_vs_Bra_KaKsMap.Nature[ , 1:2]

# have a look at the final format
head(Ath_vs_Bra.KaKsMap)
     KaKs      Gene
1 0.22660 At5g53110
2 0.08425 At5g14980
3 0.32040 At2g26050
4 0.24060 At4g17650
5 0.29970 At5g20040
6 0.43220 At4g19750

Arabidopsis thaliana versus Capsella rubella

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
Ath_vs_Crub_KaKsMap.Nature <- read_excel("Nature_2012_Arabidopsis_thaliana_KaKsMaps.xls", sheet = 3)

# format KaKs Map of Arabidopsis thaliana versus Capsella rubella for use with myTAI
Ath_vs_Crub.KaKsMap <- Ath_vs_Crub_KaKsMap.Nature[ , 1:2]

# have a look at the final format
head(Ath_vs_Crub.KaKsMap)
     KaKs      Gene
1 0.41030 At1g01010
2 0.50510 At1g01020
3 0.13300 At1g01030
4 0.10300 At1g01040
5 0.04453 At1g01050
6 0.27210 At1g01060

Arabidopsis thaliana versus Thelungiella halophila

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
Ath_vs_Thal_KaKsMap.Nature <- read_excel("Nature_2012_Arabidopsis_thaliana_KaKsMaps.xls", sheet = 4)

# format KaKs Map of Arabidopsis thaliana versus Thelungiella halophila for use with myTAI
Ath_vs_Thal.KaKsMap <- Ath_vs_Thal_KaKsMap.Nature[ , 1:2]

# have a look at the final format
head(Ath_vs_Thal.KaKsMap)
     KaKs      Gene
1 0.54110 At1g01010
2 0.37860 At1g01020
3 0.10400 At1g01030
4 0.11070 At1g01040
5 0.01867 At1g01050
6 0.36660 At1g01060

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Map and KaKs Maps of Arabidopsis thaliana from Quint et al., 2012 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution

Published Phylostratigraphic Map:

  • Organisms: Mus musculus (Ensembl v. 66), Homo sapiens (Ensembl v. 68), Danio rerio (Ensembl v. 68), Gasterosteus aculeatus (Ensembl v. 68)
  • E-value cutoff: 1E-3 (blastp; protein sequences) and 1E-15 (tblastn; ESTs)
  • Sequence type: Protein Sequences and ESTs
  • Reference data bases: NCBI nr (protein); trace and EST archives (EST)
  • Splice variants: always using the longest splice variant

Download Map using R:

# download the Phylostratigraphic Maps
# from Neme and Tautz, 2013
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1186%2F1471-2164-14-117/MediaObjects/12864_2012_4867_MOESM1_ESM.xlsx", 
               destfile = "BMCGenomics_2013_species_PhyloMaps.xlsx" )

Read the *.xlsx file storing the Phylostratigraphic Maps of Mus musculus, Homo sapiens, Danio rerio, and Gasterosteus aculeatus and format it for the use with myTAI:

Mus musculus

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file: Mus musculus
MusMusculusPhyloMap.BMCGenomics <- read_excel("BMCGenomics_2013_species_PhyloMaps.xlsx", sheet = 1)

# format Phylostratigraphic Map of Mus musculus for use with myTAI
MusMusculus.PhyloMap <- MusMusculusPhyloMap.BMCGenomics[ , c(2,1)]

# have a look at the final format
head(MusMusculus.PhyloMap)
  Oldest Phylostratum    Ensembl Gene ID
1                   1 ENSMUSG00000074155
2                   1 ENSMUSG00000086875
3                   1 ENSMUSG00000006948
4                   1 ENSMUSG00000079344
5                   1 ENSMUSG00000055193
6                   1 ENSMUSG00000004789

Homo sapiens

# read the excel file: Homo sapiens
HomoSapiensPhyloMap.BMCGenomics <- read_excel("BMCGenomics_2013_species_PhyloMaps.xlsx", sheet = 2)

# format Phylostratigraphic Map of Homo sapiens for use with myTAI
HomoSapiens.PhyloMap <- HomoSapiensPhyloMap.BMCGenomics[ , c(2,1)]

# have a look at the final format
head(HomoSapiens.PhyloMap)
  Oldest Phylostratum Ensembl Gene ID
1                   1 ENSG00000004059
2                   1 ENSG00000004478
3                   1 ENSG00000003137
4                   1 ENSG00000003509
5                   1 ENSG00000001036
6                   1 ENSG00000002587

Danio rerio

# read the excel file: Danio rerio
DanioRerioPhyloMap.BMCGenomics <- read_excel("BMCGenomics_2013_species_PhyloMaps.xlsx", sheet = 3)

# format Phylostratigraphic Map of Danio rerio for use with myTAI
DanioRerio.PhyloMap <- DanioRerioPhyloMap.BMCGenomics[ , c(2,1)]

# have a look at the final format
head(DanioRerio.PhyloMap)
  Oldest Phylostratum    Ensembl Gene ID
1                   1 ENSDARG00000033231
2                   1 ENSDARG00000000102
3                   1 ENSDARG00000000241
4                   1 ENSDARG00000000370
5                   1 ENSDARG00000000380
6                   1 ENSDARG00000000472

Gasterosteus aculeatus

# read the excel file: Gasterosteus aculeatus
GasterosteusAculeatusPhyloMap.BMCGenomics <- read_excel("BMCGenomics_2013_species_PhyloMaps.xlsx", sheet = 4)

# format Phylostratigraphic Map of Gasterosteus aculeatus for use with myTAI
GasterosteusAculeatus.PhyloMap <- GasterosteusAculeatusPhyloMap.BMCGenomics[ , c(2,1)]

# have a look at the final format
head(GasterosteusAculeatus.PhyloMap)
  Oldest Phylostratum    Ensembl Gene ID
1                   1 ENSGACG00000000009
2                   1 ENSGACG00000000017
3                   1 ENSGACG00000000018
4                   1 ENSGACG00000000019
5                   1 ENSGACG00000000022
6                   1 ENSGACG00000000027

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps of the aforementioned species from Neme and Tautz, 2013 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Phylostratigraphic Profiles in Zebrafish Uncover Chordate Origins of the Vertebrate Brain

Published Phylostratigraphic Map:

  • Organisms: Danio rerio (zebrafish) and Drosophila melanogaster
  • E-value cutoff: 1E-3 (blastp; protein sequences) and 1E-15 (tblastn; ESTs)
  • Sequence type: Protein Sequences and ESTs
  • Reference data bases: NCBI nr (protein); trace and EST archives (EST)
  • Splice variants: always using the longest splice variant

Download Map using R:

# download the Phylostratigraphic Map of Danio rerio
# from Šestak and Domazet-Lošo, 2015

download.file( url      = "https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mbe/32/2/10.1093_molbev_msu319/1/msu319_Supplementary_Data.zip?Expires=1690187793&Signature=IaT8wOG0Q2ftMybR2s2JaQu0z38sToxHW4p7HZ3XFNBC089SoMLBkxhLzpvp9nHypm14pM4RLJZXxX5zi~m9Zl29tlIXHgzu75zVddtgP6qDDZ4m00~JTCRIiivLTzmNlAjthM5hzvDkAPuy8j8Ya0KLUVRy0867nVRsY58weDf1Ql79ZiWNT1TTIXMGD~l2OPT4kSTaCOtiZ6xyIceXwDCo~6dgA82MC1LuFdNUXkaEPJ7NhtDHvpQ1CqF474TfIqpBZ~TCGv0Q1hhbDA~q0o-TMsPvsOIueHpnrvPj6nEPlHJDC0G2mdbE7si5gjFi5T7Ll5Ctw-l3a5zyeuWEdw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA",
               destfile = "MBE_2015a_Drerio_PhyloMap.zip")

utils::unzip( zipfile = "MBE_2015a_Drerio_PhyloMap.zip",
              files = "TableS3-2.xlsx")
               
# download the Phylostratigraphic Map of Drosophila melanogaster
# from Šestak and Domazet-Lošo, 2015
utils::unzip( zipfile = "MBE_2015a_Drerio_PhyloMap.zip",
              files = "TableS6.xlsx")
    

Read the *.xlsx file storing the Phylostratigraphic Map of D. rerio and D. melanogaster and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
DrerioPhyloMap.MBEa <- read_excel("TableS3-2.xlsx", sheet = 1, skip = 4)

# format Phylostratigraphic Map for use with myTAI
Drerio.PhyloMap <- DrerioPhyloMap.MBEa[ , 1:2]

# have a look at the final format
head(Drerio.PhyloMap)
  Phylostrata            ZFIN_ID
1           1 ZDB-GENE-000208-13
2           1 ZDB-GENE-000208-17
3           1 ZDB-GENE-000208-18
4           1 ZDB-GENE-000208-23
5           1  ZDB-GENE-000209-3
6           1  ZDB-GENE-000209-4
# read the excel file
DmelanogasterPhyloMap.MBEa <- read_excel("TableS6.xlsx", sheet = 1, skip = 4)

# format Phylostratigraphic Map for use with myTAI
Dmelanogaster.PhyloMap <- DmelanogasterPhyloMap.MBEa[ , 1:2]

# have a look at the final format
head(Dmelanogaster.PhyloMap)
  Phylostrata FlyBase_Gene_ID
1           1     FBgn0000017
2           1     FBgn0000024
3           1     FBgn0000032
4           1     FBgn0000036
5           1     FBgn0000038
6           1     FBgn0000039

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps of the aforementioned species from Šestak and Domazet-Lošo, 2015 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis

Published Phylostratigraphic Map:

  • Organisms: Danio rerio (zebrafish), Drosophila melanogaster (fly), and Arabidopsis thaliana
  • E-value cutoff: 1E-5 (blastp; protein sequences)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr (protein) + custom selection of genomes (phytozome, flybase)
  • Splice variants: always using the longest splice variant

Published Divergence Maps:

  • Organisms: Arabidopsis thaliana versus Arabidopsis lyrata; Arabidopsis thaliana versus Brassica rapa; Arabidopsis thaliana versus Capsella rubella; Arabidopsis thaliana versus Thelungiella halophila; Danio rerio versus A. mexicanus ; Danio rerio versus F. rubripes; Danio rerio versus X. maculatus ; Danio rerio versus G. morhua ; Drosophila melanogaster versus D. simulans ; Drosophila melanogaster versus D. yakuba ; Drosophila melanogaster versus D. persimilis ; Drosophila melanogaster versus D. virilis ;
  • E-value cutoff: 1E-5 (blastp - best reciprocal hit; protein sequences)
  • Sequence type: CDS + Protein Sequences

Download Maps using R:

# download the Phylostratigraphic Maps
# from Drost et al., 2015
download.file( url      = "http://files.figshare.com/1798295/Supplementary_table_S2.xls", 
               destfile = "MBE_2015b_PhyloMaps.xls" )
               
               
# download the Divergence Maps
download.file( url      = "http://files.figshare.com/1798297/Supplementary_table_S4.xls", 
               destfile = "MBE_2015b_DivergenceMaps.xls" )
    

Read the *.xls file storing the Phylostratigraphic Maps and Divergence Maps and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
DrerioPhyloMap.MBEb <- read_excel("MBE_2015b_PhyloMaps.xls", sheet = 1)

# have a look at the final format
head(DrerioPhyloMap.MBEb)
  Phylostratum             GeneID
1            1 ENSDARG00000000002
2            1 ENSDARG00000000019
3            1 ENSDARG00000000102
4            1 ENSDARG00000000241
5            1 ENSDARG00000000324
6            1 ENSDARG00000000369
# load package readxl
library(readxl)

# read the excel file
DmelanogasterPhyloMap.MBEb <- read_excel("MBE_2015b_PhyloMaps.xls", sheet = 2)

# have a look at the final format
head(DmelanogasterPhyloMap.MBEb)
  Phylostratum      GeneID
1            1 fbpp0070006
2            1 fbpp0070025
3            1 fbpp0070051
4            1 fbpp0070054
5            1 fbpp0070061
6            1 fbpp0070064
# load package readxl
library(readxl)

# read the excel file
Athaliana.MBEb <- read_excel("MBE_2015b_PhyloMaps.xls", sheet = 3)

# have a look at the final format
head(Athaliana.MBEb)
  Phylostratum      GeneID
1            1 at1g01040.2
2            1 at1g01050.1
3            1 at1g01070.1
4            1 at1g01080.2
5            1 at1g01090.1
6            1 at1g01120.1
# load package readxl
library(readxl)

# Danio rerio

# D. rerio vs. A. mexicanus
Drerio_vs_Amex_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 9)
# D. rerio vs. F. rubripes
Drerio_vs_Frubripes_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 10)
# D. rerio vs. X. maculatus
Drerio_vs_Xmac_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 11)
# D. rerio vs. G. morhua
Drerio_vs_Gmor_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 12)



# Drosophila melanogaster

# D. melanogaster vs. D. simulans
Dmel_Dsim_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 1)
# D. melanogaster vs. D. yakuba
Dmel_Dyak_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 2)
# D. melanogaster vs. D. persimilis
Dmel_Dper_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 3)
# D. melanogaster vs. D. virilis
Dmel_Dvir_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 4)



# Arabidopsis thaliana

# A. thaliana vs A. lyrata
Ath_Aly_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 5)
# A thaliana vs. T. halophila
Ath_Brapa_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 8)
# A thaliana vs. C. rubella
Ath_Crub_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 7)
# A thaliana vs. C. papaya 
Ath_Cpapaya_DivergenceExpressionSet <- read_excel("MBE_2015b_DivergenceMaps.xls",sheet = 6)

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps and Divergence Maps of the aforementioned species from Drost et al., 2015 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: A “Developmental Hourglass” in Fungi

Published Phylostratigraphic Map:

  • Organisms: Coprinopsis cinerea (fungi)
  • E-value cutoff: 1E-3 (blastp; protein sequences)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr (protein) + custom selection of genomes
  • Splice variants: always using the longest splice variant

Download Phylostratigraphic Map and KaKsMaps in R:

# download the Phylostratigraphic Map of Coprinopsis cinerea
# from Cheng et al., 2015
download.file( url      = "https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mbe/32/6/10.1093_molbev_msv047/2/msv047_Supplementary_Data.zip?Expires=1690186915&Signature=ZU-Sy0xAdxdvcnmoJuoCH9vX95C5~6gQxN9IJ-jhlmAeVkN9Lmil1NBgOGL2q42uNodckyT2w4B-8q2JfLQP1HJ5~GnxMJAVbxbCMkoxnOsd6PIH-a8Y~PAUlLbVqAEEbroCsDBwzLd6dykBfDRhtX3xYYZCWc8UWqyk2mL0tFKVdaKYUGWtl9WlWb-BUcUrtYBJpmuh7OCj2POzt0YJ-a-fqoqx9Usq6KtvgV2RbdwSQb3mvavzio8a99yjBZBC0MPQOimwdWSoRdoIXC1Ey2NzTMDOqy-t1vEceVr4vK7t3laZOHUP8N43MXM5rFdmq6rbo3ggLOZJSFf9jpcM7A__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA", 
               destfile = "MBE_2015c_Ccinerea_Maps.zip" )

utils::unzip( zipfile = "MBE_2015c_Ccinerea_Maps.zip",
              files = "Table_S9.xlsx")
               

Read the *.xls file storing the Phylostratigraphic Maps and Divergence Maps and format it for the use with myTAI:

# load package readxl
library(readxl)

# Coprinopsis cinerea Phylostratigraphic Map

CcinereaPhyloMap.MBEc <- read_excel("Table_S9.xlsx", sheet = 2)

Ccinerea.PhyloMap <- CcinereaPhyloMap.MBEc[ , c(4,1)]

colnames(Ccinerea.PhyloMap) <- c("Phylostratum", "GeneID")

# have a look at the final format
head(Ccinerea.PhyloMap)
  Phylostratum     GeneID
1           10 CC1G_00004
2            4 CC1G_00007
3            3 CC1G_00011
4            2 CC1G_00012
5            3 CC1G_00013
6            2 CC1G_00014

# load package readxl
library(readxl)

# Coprinopsis cinerea KaKs Maps

CcinereaKaKsMaps.MBEc <- read_excel("Table_S9.xlsx", sheet = 3, skip = 1)

# Coprinopsis cinerea versus Agaricus bisporus var bisporus H97
Ccin_vs_Abisp.KaKsMap <- CcinereaKaKsMaps.MBEc[ , c(2,1)]

# Coprinopsis cinerea versus Laccaria bicolor
Ccin_vs_Lbicol.KaKsMap <- CcinereaKaKsMaps.MBEc[ , c(5,1)]

# Coprinopsis cinerea versus Lentinula edodes
Ccin_vs_Ledodes.KaKsMap <- CcinereaKaKsMaps.MBEc[ , c(8,1)]

# Coprinopsis cinerea versus Schizophyllum commune
Ccin_vs_Scommune.KaKsMap <- CcinereaKaKsMaps.MBEc[ , c(11,1)]

# have a look at the final format: example -> Ccin_vs_Abisp.KaKsMap
head(Ccin_vs_Abisp.KaKsMap)
                  dN/dS    Gene_id
1   0.20669999999999999 CC1G_00007
2 7.6300000000000007E-2 CC1G_00011
3   0.15290000000000001 CC1G_00012
4                0.3362 CC1G_00013
5                2.7E-2 CC1G_00014
6 3.5700000000000003E-2 CC1G_00015

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps and KaKs Maps of the aforementioned species from Cheng et al., 2015 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: High expression of new genes in trochophore enlightening the ontogeny and evolution of trochozoans

Published Phylostratigraphic Map:

  • Organisms: oyster, abalone, sand worm
  • E-value cutoff: E-value cutoff is not specified in the paper (blastp searches were conducted against the database using oyster proteins, while blastx searches were conducted using the unigenes of abalones and sand worms)
  • Sequence type: Protein Sequences, Unigenes
  • Reference data bases: NCBI nr (protein) + custom selection of genomes
  • Splice variants: Not specified in the paper

Download Phylostratigraphic Maps in R:

# download the Phylostratigraphic Maps of oyster, abalone, sand worm
# from Xu et al., 2016
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1038%2Fsrep34664/MediaObjects/41598_2016_BFsrep34664_MOESM2_ESM.xls", 
               destfile = "Xu_2016_Maps.xls" )
               

Read the *.xls file storing the Phylostratigraphic Maps and format it for the use with myTAI:

# load package readxl
library(readxl)

### Oyster1 Phylostratigraphic Map

oyster1.data <- read_excel("Xu_2016_Maps.xls", sheet = 1)

Oyster1.PhyloMap <- dplyr::select(oyster1.data, age, GeneID)

colnames(Oyster1.PhyloMap) <- c("Phylostratum", "GeneID")

# have a look at the final format
Oyster1.PhyloMap


### Oyster2 Phylostratigraphic Map
oyster2.data <- read_excel("Xu_2016_Maps.xls", sheet = 2)

Oyster2.PhyloMap <- dplyr::select(oyster2.data, age, GeneID)

colnames(Oyster2.PhyloMap) <- c("Phylostratum", "GeneID")

# have a look at the final format
Oyster2.PhyloMap


### Abalone Phylostratigraphic Map
abalone.data <- read_excel("Xu_2016_Maps.xls", sheet = 3)

Abalone.PhyloMap <- dplyr::select(abalone.data, age, UnigeneID)

colnames(Abalone.PhyloMap) <- c("Phylostratum", "GeneID")

# have a look at the final format
Abalone.PhyloMap


### Sand worm Phylostratigraphic Map
sandworm.data <- read_excel("Xu_2016_Maps.xls", sheet = 4)

Sandworm.PhyloMap <- dplyr::select(sandworm.data, age, UnigeneID)

colnames(Sandworm.PhyloMap) <- c("Phylostratum", "GeneID")

# have a look at the final format
Sandworm.PhyloMap

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps of the aforementioned species from Xu et al., 2016 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Single worm transcriptomics identifies a developmental core network of oscillating genes with deep conservation across nematodes

Published Phylostratigraphic Map:

  • Organisms: Caenorhabditis elegans, Pristionchus pacificus
  • E-value cutoff: 1E-3 (DIAMOND; protein sequences)
  • Sequence type: Protein Sequences
  • Reference data bases: WormBase (protein)
  • Splice variants: longest isoform

Download Maps using R:

# download the Phylostratigraphic Maps
# from Sun et al., 2021
download.file( url      = "https://genome.cshlp.org/content/suppl/2021/08/23/gr.275303.121.DC1/Supplemental_Table_S5.xlsx", 
               destfile = "GenomeResearch_2021_PhyloMap_Pp.xlsx" )

download.file( url      = "https://genome.cshlp.org/content/suppl/2021/08/23/gr.275303.121.DC1/Supplemental_Table_S6.xlsx", 
               destfile = "GenomeResearch_2021_PhyloMap_Ce.xlsx" )

Read the *.xls file storing the Phylostratigraphic Maps and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
PpacificusPhyloMap <- read_excel("GenomeResearch_2021_PhyloMap_Pp.xlsx", sheet = 1, skip = 1)
colnames(PpacificusPhyloMap) <- c("GeneID", "Phylostratum")
PpacificusPhyloMap$Phylostratum <- gsub("unsign", "p00", PpacificusPhyloMap$Phylostratum)
PpacificusPhyloMap$Phylostratum <- as.numeric(gsub("p", "", PpacificusPhyloMap$Phylostratum))

# have a look at the final format
head(PpacificusPhyloMap)
# load package readxl
library(readxl)

# read the excel file
CelegansPhyloMap <- read_excel("GenomeResearch_2021_PhyloMap_Ce.xlsx", sheet = 1, skip = 1)
colnames(CelegansPhyloMap) <- c("GeneID", "TranscriptID", "Phylostratum")
CelegansPhyloMap$Phylostratum <- gsub("unsign", "c00", CelegansPhyloMap$Phylostratum)
CelegansPhyloMap$Phylostratum <- as.numeric(gsub("c", "", CelegansPhyloMap$Phylostratum))

# have a look at the final format
head(CelegansPhyloMap)

Now you can use the MatchMap() function implemented in myTAI to match the Phylostratigraphic Maps of the aforementioned species from Sun et al., 2021 to any gene expression set of your interest (see Introduction to Phylotranscriptomics for details).

Open here

Title: Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra

Published Phylostratigraphic Map:

  • Organisms:
    • Fungi: Saccharomyces cerevisiae (strain S288C), Schizosaccharomyces pombe, Aspergillus niger (strain CBS 513.88), Morchella conica, Cryptococcus neoformans (var. neoformans strain JEC21), Kwoniella mangroviensis (strain CBS 8507), Agaricus bisporus (var. bisporus strain H97), Tremella mesenterica (strain DSM 1558), Mucor circinelloides, Batrachochytrium dendrobatidis (strain JAM81)
    • Animals: Drosophila melanogaster, Caenorhabditis elegans, Echinococcus granulosus, Octopus vulgaris, Capitella teleta, Mus musculus, Apostichopus japonicus, Nematostella vectensis, Trichoplax adhaerens, Amphimedon queenslandica
    • Plants: Arabidopsis thaliana, Glycine max, Solanum lycopersicum, Oryza sativa, Vanilla planifolia, Musa acuminata, Picea glauca, Selaginella moellendorffii, Physcomitrella patens, Marchantia polymorpha
  • E-value cutoff: 1E-5 (DIAMOND; protein sequences; ultra-sensitive mode)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr (protein)
  • Splice variants: always using the representative sequences from UniProt (under "Download one protein sequence per gene (FASTA)")

This study used GenEra for gene age inference (phylostratigraphy). The following NCBI Taxonomic-ID were used.

# Fungi
559292	Saccharomyces cerevisiae S288C
4896	Schizosaccharomyces pombe
425011	Aspergillus niger CBS 513.88
5194	Morchella conica
214684	Cryptococcus neoformans var. neoformans JEC21
1296122	Kwoniella mangroviensis CBS 8507
936046	Agaricus bisporus var. bisporus H97
578456	Tremella mesenterica DSM 1558
36080	Mucor circinelloides
684364	Batrachochytrium dendrobatidis JAM81
# Animals
7227	Drosophila melanogaster
6239	Caenorhabditis elegans
6210	Echinococcus granulosus
6645	Octopus vulgaris
283909	Capitella teleta
10090	Mus musculus
307972	Apostichopus japonicus
45351	Nematostella vectensis
10228	Trichoplax adhaerens
400682	Amphimedon queenslandica
# Plants
3702	Arabidopsis thaliana
3847	Glycine max
4081	Solanum lycopersicum
39947	Oryza sativa
51239	Vanilla planifolia
214687	Musa acuminata
3330	Picea glauca
88036	Selaginella moellendorffii
3218	Physcomitrella patens
3197	Marchantia polymorpha

Download Phylostratigraphic Maps in R:

# download the Phylostratigraphic Maps of 10 animals, 10 plants and/or 10 fungi.
# [Fungus] from Barrera-Redondo et al., 2023
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-023-02895-z/MediaObjects/13059_2023_2895_MOESM3_ESM.xlsx", 
               destfile = "Barrera-Redondo_2023_Maps_fungus.xlsx" )

# [Animals] from Barrera-Redondo et al., 2023
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-023-02895-z/MediaObjects/13059_2023_2895_MOESM4_ESM.xlsx", 
               destfile = "Barrera-Redondo_2023_Maps_animal.xlsx" )

# [Plants] from Barrera-Redondo et al., 2023
download.file( url      = "https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-023-02895-z/MediaObjects/13059_2023_2895_MOESM5_ESM.xlsx", 
               destfile = "Barrera-Redondo_2023_Maps_plant.xlsx" )

Read the *.xlsx file storing the Phylostratigraphic Maps and format it for use with myTAI:

# load package readxl
library(readxl)
library(dplyr)

### Fungus Phylostratigraphic Maps

# Budding yeast
Saccharomyces_cerevisiae_S288C.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "559292_gene_ages")
Saccharomyces_cerevisiae_S288C.PhyloMap <- 
  dplyr::select(
    Saccharomyces_cerevisiae_S288C.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Fission yeast
Schizosaccharomyces_pombe.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "4896_gene_ages")
Schizosaccharomyces_pombe.PhyloMap <- 
  dplyr::select(
    Schizosaccharomyces_pombe.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Black mould fungus
Aspergillus_niger_CBS_513.88.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "425011_gene_ages")
Aspergillus_niger_CBS_513.88.PhyloMap <- 
  dplyr::select(
    Aspergillus_niger_CBS_513.88.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Black morels
Morchella_conica.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "5194_gene_ages")
Morchella_conica.PhyloMap <- 
  dplyr::select(
    Morchella_conica.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Cryptococcus neoformans
Cryptococcus_neoformans_var.neoformans_JEC21.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "214684_gene_ages")
Cryptococcus_neoformans_var.neoformans_JEC21.PhyloMap <- 
  dplyr::select(
    Cryptococcus_neoformans_var.neoformans_JEC21.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Kwoniella mangroviensis (or K. mangrovensis)
Kwoniella_mangroviensis.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "1296122_gene_ages")
Kwoniella_mangroviensis.PhyloMap <- 
  dplyr::select(
    Kwoniella_mangroviensis.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Portobello mushrooms
Agaricus_bisporus.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "936046_gene_ages")
Agaricus_bisporus.PhyloMap <- 
  dplyr::select(
    Agaricus_bisporus.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Yellow brain (or goldeb jelly fungus, yellow trembler, witches' butter)
Tremella_mesenterica.data <- 
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "578456_gene_ages")
Tremella_mesenterica.PhyloMap <- 
  dplyr::select(
    Tremella_mesenterica.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Mucor circinelloides
Mucor_circinelloides.data <-
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "36080_gene_ages")
Mucor_circinelloides.PhyloMap <- 
  dplyr::select(
    Mucor_circinelloides.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Amphibian chytrid fungus
Batrachochytrium_dendrobatidis_JAM81.data <-
  read_excel("Barrera-Redondo_2023_Maps_fungus.xlsx", sheet = "684364_gene_ages")
Batrachochytrium_dendrobatidis_JAM81.PhyloMap <- 
  dplyr::select(
    Batrachochytrium_dendrobatidis_JAM81.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

### Animal Phylostratigraphic Maps

# Fruit fly
Drosophila_melanogaster.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "7227_gene_ages")
Drosophila_melanogaster.PhyloMap <- 
  dplyr::select(
    Drosophila_melanogaster.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Caenorhabditis elegans
Caenorhabditis_elegans.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "6239_gene_ages")
Caenorhabditis_elegans.PhyloMap <- 
  dplyr::select(
    Caenorhabditis_elegans.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Hydatid worm
Echinococcus_granulosus.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "6210_gene_ages")
Echinococcus_granulosus.PhyloMap <- 
  dplyr::select(
    Echinococcus_granulosus.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Common octopus
Octopus_vulgaris.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "6645_gene_ages")
Octopus_vulgaris.PhyloMap <- 
  dplyr::select(
    Octopus_vulgaris.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Capitella teleta
Capitella_teleta.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "283909_gene_ages")
Capitella_teleta.PhyloMap <- 
  dplyr::select(
    Capitella_teleta.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

#	House mouse
Mus_musculus.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "10090_gene_ages")
Mus_musculus.PhyloMap <- 
  dplyr::select(
    Mus_musculus.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))
  
# Converting UniProtKB ids into ENSEMBL gene ids

Mus_musculus.PhyloMap_ENSEMBL <- phylomapr::convertID(
  phylomap = Mus_musculus.PhyloMap,
  mart = "ENSEMBL_MART_ENSEMBL",
  dataset = "mmusculus_gene_ensembl",
  filters = "uniprot_gn_id"
)

#	Japanese sea cucumber
Apostichopus_japonicus.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "307972_gene_ages")
Apostichopus_japonicus.PhyloMap <- 
  dplyr::select(
    Apostichopus_japonicus.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

#	Starlet sea anemone
Nematostella_vectensis.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "45351_gene_ages")
Nematostella_vectensis.PhyloMap <- 
  dplyr::select(
    Nematostella_vectensis.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Trichoplax adhaerens
Trichoplax_adhaerens.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "10228_gene_ages")
Trichoplax_adhaerens.PhyloMap <- 
  dplyr::select(
    Trichoplax_adhaerens.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Amphimedon queenslandica
Amphimedon_queenslandica.data <- 
  read_excel("Barrera-Redondo_2023_Maps_animal.xlsx", sheet = "400682_gene_ages")
Amphimedon_queenslandica.PhyloMap <- 
  dplyr::select(
    Amphimedon_queenslandica.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

### Plant Phylostratigraphic Maps

# Thale cress
Arabidopsis_thaliana.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "3702_gene_ages")
Arabidopsis_thaliana.PhyloMap <- 
  dplyr::select(
    Arabidopsis_thaliana.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Soybean
Glycine_max.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "3847_gene_ages")
Glycine_max.PhyloMap <- 
  dplyr::select(
    Glycine_max.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Tomato
Solanum_lycopersicum.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "4081_gene_ages")
Solanum_lycopersicum.PhyloMap <- 
  dplyr::select(
    Solanum_lycopersicum.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Rice
Oryza_sativa.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "39947_gene_ages")
Oryza_sativa.PhyloMap <- 
  dplyr::select(
    Oryza_sativa.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Flat-leaved vanilla
Vanilla_planifolia.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "51239_gene_ages")
Vanilla_planifolia.PhyloMap <- 
  dplyr::select(
    Vanilla_planifolia.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Musa acuminata
Musa_acuminata.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "214687_gene_ages")
Musa_acuminata.PhyloMap <- 
  dplyr::select(
    Musa_acuminata.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# White spruce
Picea_glauca.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "3330_gene_ages")
Picea_glauca.PhyloMap <- 
  dplyr::select(
    Picea_glauca.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Selaginella moellendorffii
Selaginella_moellendorffii.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "88036_gene_ages")
Selaginella_moellendorffii.PhyloMap <- 
  dplyr::select(
    Selaginella_moellendorffii.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Spreading earthmoss
Physcomitrella_patens.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "3218_gene_ages")
Physcomitrella_patens.PhyloMap <- 
  dplyr::select(
    Physcomitrella_patens.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))

# Marchantia polymorpha
Marchantia_polymorpha.data <- 
  read_excel("Barrera-Redondo_2023_Maps_plant.xlsx", sheet = "3197_gene_ages")
Marchantia_polymorpha.PhyloMap <- 
  dplyr::select(
    Marchantia_polymorpha.data,
    Phylostratum = rank,
    GeneID = `#gene`
  ) %>%
  dplyr::mutate(Phylostratum = as.numeric(Phylostratum))
Open here

Title: A highly contiguous genome assembly reveals sources of genomic novelty in the symbiotic fungus Rhizophagus irregularis

Published Phylostratigraphic Map:

  • Organisms: Rhizophagus irregularis, Geosiphon pyriformis, Gigaspora margarita, Dissophora decumbens, Mortierella elongata, Radiomyces spectabilis, Phycomyces blakesleeanus
  • E-value cutoff: 1E-5 (DIAMOND; protein sequences; ultra-sensitive mode)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr (protein)
  • Splice variants: always using the longest isoform when available

This study used GenEra for gene age inference (phylostratigraphy). The following NCBI Taxonomic-ID were used.

50956	Geosiphon pyriformis
4874	Gigaspora margarita
1432141	Rhizophagus irregularis
101101	Dissophora decumbens
1314771	Mortierella elongata
64574	Radiomyces spectabilis
4837 Phycomyces blakesleeanus

Download Phylostratigraphic Maps in R:

# download the Phylostratigraphic Maps from Manley et al., 2023
# Rhizophagus irregularis
download.file( url      = "https://zenodo.org/record/7713976/files/Rhizophagus_irregularis_DAOM197198_1432141_phyloranks.tsv", 
               destfile = "Rhizophagus_irregularis_DAOM197198_1432141_phyloranks.tsv")
# Dissophora decumbens
download.file( url      = "https://zenodo.org/record/7713976/files/Disdec1_101101_phyloranks.tsv", 
               destfile = "Disdec1_101101_phyloranks.tsv")
# Geosiphon pyriformis
download.file( url      = "https://zenodo.org/record/7713976/files/Geopyr1_50956_phyloranks.tsv", 
               destfile = "Geopyr1_50956_phyloranks.tsv")
# Gigaspora margarita
download.file( url      = "https://zenodo.org/record/7713976/files/Gigmar1_4874_phyloranks.tsv", 
               destfile = "Gigmar1_4874_phyloranks.tsv")
# Mortierella elongata
download.file( url      = "https://zenodo.org/record/7713976/files/Morel2_1314771_phyloranks.tsv", 
               destfile = "Morel2_1314771_phyloranks.tsv")
# Phycomyces blakesleeanus
download.file( url      = "https://zenodo.org/record/7713976/files/Phybl2_4837_phyloranks.tsv", 
               destfile = "Phybl2_4837_phyloranks.tsv")
# Radiomyces spectabilis
download.file( url      = "https://zenodo.org/record/7713976/files/Radspe1_64574_phyloranks.tsv", 
               destfile = "Radspe1_64574_phyloranks.tsv")

Read the *.tsv file storing the Phylostratigraphic Maps and format it for the use with myTAI:

# load package readr
library(readr)

### Phylostratigraphic Maps
# Rhizophagus irregularis
Rhizophagus_irregularis.data <-readr::read_tsv("Rhizophagus_irregularis_DAOM197198_1432141_phyloranks.tsv")
Rhizophagus_irregularis.PhyloMap <- 
  dplyr::select(
    Rhizophagus_irregularis.data,
    Phylostratum = PS,
    GeneID
  )

# Dissophora decumbens
Dissophora_decumbens.data <-readr::read_tsv("Disdec1_101101_phyloranks.tsv")
Dissophora_decumbens.PhyloMap <- 
  dplyr::select(
    Dissophora_decumbens.data,
    Phylostratum = rank,
    GeneID = `#gene`
  )

# Geosiphon pyriformis
Geosiphon_pyriformis.data <-readr::read_tsv("Geopyr1_50956_phyloranks.tsv")
Geosiphon_pyriformis.PhyloMap <- 
  dplyr::select(
    Geosiphon_pyriformis.data,
    Phylostratum = rank,
    GeneID = `#gene`
  )

# Gigaspora margarita
Gigaspora_margarita.data <-readr::read_tsv("Gigmar1_4874_phyloranks.tsv")
Gigaspora_margarita.PhyloMap <- 
  dplyr::select(
    Gigaspora_margarita.data,
    Phylostratum = rank,
    GeneID = `#gene`
  )

# Mortierella elongata
Mortierella_elongata.data <-readr::read_tsv("Morel2_1314771_phyloranks.tsv")
Mortierella_elongata.PhyloMap <- 
  dplyr::select(
    Mortierella_elongata.data,
    Phylostratum = rank,
    GeneID = V1
  )

# Phycomyces blakesleeanus
Phycomyces_blakesleeanus.data <-readr::read_tsv("Phybl2_4837_phyloranks.tsv")
Phycomyces_blakesleeanus.PhyloMap <- 
  dplyr::select(
    Phycomyces_blakesleeanus.data,
    Phylostratum = rank,
    GeneID
  )

# Radiomyces spectabilis
Radiomyces_spectabilis.data <-readr::read_tsv("Radspe1_64574_phyloranks.tsv")
Radiomyces_spectabilis.PhyloMap <- 
  dplyr::select(
    Radiomyces_spectabilis.data,
    Phylostratum = rank,
    GeneID
  )
Open here

Unpublished Phylostratigraphic Map:

  • Organisms: Homo sapiens
  • E-value cutoff: 1E-5 (DIAMOND; protein sequences; sensitive mode)
  • Sequence type: Protein Sequences
  • Reference data bases: NCBI nr (protein)
  • Splice variants: using the representative sequences from UniProt (under "Download one protein sequence per gene (FASTA)")

This study used GenEra for gene age inference (phylostratigraphy). The following NCBI Taxonomic-ID was used:

9606	Homo sapiens
# install phylomapr

devtools::install_github("LotharukpongJS/phylomapr")

# load package phylomapr
library(phylomapr)

### Phylostratigraphic Maps
# Homo sapiens
Homo_sapiens.PhyloMap <- phylomapr::Homo_sapiens.PhyloMap

Users can also convert the UniProtKB ids present in this human phylomap to ENSEMBL gene ids using biomartr:

This can be done using phylomapr, where the function phylomapr::convertID() wraps over biomartr::biomart for this particular context.

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("ropensci/biomartr")

Homo_sapiens.PhyloMap.ENSEMBL <- phylomapr::convertID(
  phylomap = Homo_sapiens.PhyloMap,
  mart = "ENSEMBL_MART_ENSEMBL",
  dataset = "hsapiens_gene_ensembl",
  filters = "uniprot_gn_id"
)

About

A Collection of Published Phylostratigraphic Maps and Divergence Maps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published