Functional Region Enrichment Analysis

This repository provides a Python package implementing part of the analyses presented in:

Sarkar, A. K., Ward, L. D., & Kellis, M. (2016). Functional enrichments of disease variants across thousands of independent loci in eight diseases. bioRxiv. http://dx.doi.org/10.1101/048066

The R package is available from http://www.github.com/aksarkar/frea-R. The computational pipeline described in the text (which utilizes these packages) is available from http://www.github.com/aksarkar/frea-pipeline

Installation

pip install git+git://github.com:aksarkar/frea.git#egg=frea

The Python package requires:

Python > 3.2
numpy
scipy

Commentary

The design of the packages is based on several ideas, which are dependent on the characteristics of the compute environment they were developed in (Univa Grid Engine, relatively strict memory limits, but many compute nodes):

Use independent Python processes to distribute work in massively parallel fashion across compute nodes (using mechanisms outside of Python such as GNU parallel)
Use streaming algorithms wherever possible, building as few intermediate data structures as needed
Invoke modules as scripts (python -m) for entry points wherever possible
Use R to produce visualizations

Processing summary statistics

The fundamental computations here are:

Coerce the data into a standardized (internal) format
Convert the internal format to UCSC BED for use downstream
Lift over and impute UCSC BED (as needed)

Enhancer enrichments

The fundamental computations here are:

Motivate the method by estimating rank-based correlations between replicate summary statistics
Prune summary statistics according to linkage disequilibrium
Estimate the heuristic p-value cutoff for taking forward in the analysis
Calculate permutation-based significance of enrichment for every annotation
Draw and annotate the plots

Pathway enrichments

The fundamental computations here is pruning enriched GO terms according to the Gene Ontology. We parse the Open Biomedical Ontology format file to build an adjacency list representing the graph of GO relationships. The algorithm is straightforward: perform DFS on the ontology subgraph induced by the enriched terms, keeping the deepest nodes found (the fringe) at every point.

The algorithm is implemented in two coroutines: the DFS coroutine yields events (either starting from an unexplored node or moving along an edge) and the current node. If we start from an unexplored node, we add it to the fringe; if we move along an edge, we remove the node from the fringe (if it was present). At the end of the algorithm, the fringe contains nodes which were visited only once in the traversal and therefore the deepest (most specific) enriched GO terms.

Motif enrichments

The fundamental computations here are:

Counting motif overlaps, possibly grouping by PWM/factor group/PWM cluster
Compute significance of motif enrichments based on overlap counts
Draw and annotate heatmaps

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
bin		bin
frea		frea
.gitignore		.gitignore
README.org		README.org
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

frea

frea

.gitignore

.gitignore

README.org

README.org

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Functional Region Enrichment Analysis

Installation

Commentary

Processing summary statistics

Enhancer enrichments

Pathway enrichments

Motif enrichments

About

Releases

Packages

Languages

aksarkar/frea

Folders and files

Latest commit

History

Repository files navigation

Functional Region Enrichment Analysis

Installation

Commentary

Processing summary statistics

Enhancer enrichments

Pathway enrichments

Motif enrichments

About

Resources

Stars

Watchers

Forks

Languages