Skip to content

dzhang32/ER_paper_2020_supp_code

Repository files navigation

Background

This repository contains analysis code for the publication found here, which aimed at improving the annotation of disease-relevant genes using RNA-sequencing data. The accompanying web-based tool vizER can be used to visualise individual genes of interest for evidence of incomplete annotation.

Citation

If you use any code part of this repository please cite the Science Advances publication: DOI 10.1126/sciadv.aay8299.

Code contents

Directory Description
analyse_ER_annotation ER related analyisis including number of quantifying OMIM gene re-annotation and total ER Mb across annotation features. Validation of ERs across Ensembl versions and within an independent dataset
annotate_ERs Annotating ERs with metrics such as association to genes through junctions, annotation features, conservation and constraint
check_protein_coding_potential Checking protein potential of ERs
complex_disorders Re-annotation of GWAS hits from STOPGAP
download_tidy_OMIM_data Download details of Mendelian disease genes via OMIM API
export_ER_details Formatting ER details for publication
generate_ERs_varying_cut_offs_maxgaps_GTEx_tissues Using derfinder to define tissue-specific expressed regions (ERs) for each GTEx tissue*
generate_randomised_intron_inter_regions Generating tissue-specific randomised length-matched regions
GTEx_split_read_reformatting Re-format the raw GTEx junction data dowloaded from recount2 for input into annotatER
optimising_derfinder_cutoff Optimising the definitions of ERs using a gold-standard set of non-overlapping exons*

*These elements of the pipeline have been wrapped into an R package that can be found here.

About

Supplementary code for the paper: Incomplete annotation has a disproportionate impact on our understanding of Mendelian and complex neurogenetic disorders

Topics

Resources

Stars

Watchers

Forks

Languages