Sonia Garcia-Ruiz, David Zhang, Emil K Gustavsson, Guillermo Rocamora-Perez, Melissa Grant-Peters, Aine Fairbrother-Browne, Regina H Reynolds, Jonathan W Brenton, Ana L Gil-Martinez, Zhongbo Chen, Donald C Rio, Juan A Botia, Sebastian Guelfi, Leonardo Collado-Torres, Mina Ryten
bioRxiv 2023.03.29.534370; doi: https://doi.org/10.1101/2023.03.29.534370
This 'splicing-accuracy-manuscript' repository contains the code used to generate all the analyses and figures produced for the manuscript Splicing accuracy varies across human introns, tissues and age.
To produce the "Splicing" intron database, please follow the file pipeline indicated below:
- init.R. Main file. It starts the exon-exon junction download from recount3. Then, it annotates the split reads in three different categories: annotated intron, novel donor and novel acceptor junction. Once the split reads are annotated, it pairs the split reads from the annotated category with the novel donor and novel acceptor junctions across the samples of each tissue. Finally, after some intermediary calls to multiple auxiliary functions, it starts the SQL commands and creates the "Splicing" intron database.
- database_junction_pairing.R. Auxiliary file. It contains the code to do the junction pairing between the annotated introns and the novel donor and novel acceptor junctions.
- database_SQL_helper.R. Auxiliary file. It contains the SQL helper code to assist in the creation of the "Splicing" intron database.
- database_SQL_generation.R. Auxiliary file. It contains the main SQL code to create the "Splicing" database.
To produce the "Age Stratification" intron database, please follow the file pipeline indicated below:
- database_age_stratification_generation.R. Main file. It starts the GTEx v8 sample clustering by age within the age groups "20-39", "40-59" and "60-79" years-old. Then, using the previously downloaded and QC'ed exon-exon junction data used for the creation of the Splicing database, this script groups the exon-exon split reads and extracts read counts metrics across the samples of each age category. Then, it pairs the split reads from the annotated category with the split reads from the novel donor and novel acceptor junctions across the samples of each age cluster at the tissue level. Finally, after some intermediary calls to multiple auxiliary functions, it starts the SQL commands and creates the "Age Stratification" intron database.
- database_junction_pairing.R. Auxiliary file. It contains the code to do the junction pairing between the annotated introns and the novel donor and novel acceptor junctions.
- database_SQL_helper.R. Auxiliary file. It contains the SQL helper code to assist in the creation of the "Splicing" intron database.
- database_SQL_generation.R. Auxiliary file. It contains the main SQL code to create the "Splicing" database.
To evaluate whether the expression levels of RNA-binding proteins involved in post-transcriptional processes Van Nostrand et at. 2020 are affected by the age of the sample donor across BRAIN tissues, please use the code provided within the R script: rbp_expression.R.
To produce the main and supplementary figures and tables supplied with the manuscript "Splicing accuracy varies across introns, tissues and age", please use the functions provided within the R script "splicing_accuracy_manuscript_figures.R" and "splicing_accuracy_manuscript_age_figures.R".
All supplementary tables can be accessed through Zenodo using the DOI: 10.5281/zenodo.7732872
The code included within this repository has been successfully tested on:
- Ubuntu version "16.04.7 LTS (Xenial Xerus)"
- Ubuntu version "22.04.2 LTS (Jammy Jellyfish)"