Skip to content

tbersez/theobroma_genomics

Repository files navigation

Duplicated genes and genes family in Theobroma cacao genome

Case study of the Cysteine-rich kinases superfamily

Mrs. Bersez & Dubois, M2 Students Paris Saclay Uni

Abstract

Theobroma cacao or cacao is an angiosperm plant species intensely cultivated across the world. In this work we have investigated the genomic landscape of the cacao in order to both detect gene specific and whole genome duplication events. Our results were shared with our classmates in order to compare the evolution rates of different plants genomes. From this comparison we observed that large gene family sizes were more numerous in plants why low generation rates. In an other other axis of our work, we have developed and test a method to reconstruct and study genes families. Our method was applied to the CRK8 familiy of Theobroma cacao.

Main results

Gene families sizes repartition in Theobroma cacao

The FTAG finder tool was used in order to reconstruct genes families within our set of selected isoforms. Figure 1 display the occurrence of families according to their sizes. We can observe that, most of families count a small number of proteins (bellow five). However some families of impressive sizes are also displayed (up to 425 members for the biggest family). Those results were shared with our classmates in order to compare the gene families sizes between differents specie.

Ka/Ks ratios

The Yang and Neilson method was used in the script families_to_Ka_Ks_values.ipynb to compute the Ka/Ks ratios from the family reconstructed above. Results displayed one main pic around 0.4 showing that no genome wide duplication events have occured into Theobroma cacao genome