Watershed

This repository is an R-package implementation of the original command-line scripts available at the upstream repository to streamline installation and dependency management.

Watershed is an unsupervised probabilistic framework that integrates genomic annotations and RNA-seq outlier calls to identify the probability a rare variant has a functional effect on a particular RNA-seq outlier phenotype (examples of outlier phenotypes can be, but are not limited to total expression, splicing, or ASE). Watershed extends our previous model RIVER (which can also be run via this package) by incorporating information from multiple outlier phenotypes into one model, where predictions for functional effects in one outlier phenotype are informed by observed outlier calls in another phenotype. Please see our publication in Science for more details.

Installation

Install this R package from the GitHub repository:

if (!require("devtools", quietly = TRUE)){
  install.packages("devtools")
}
devtools::install_github("nicolerg/WatershedR")

Input data

For details about the input file format, see the docs:

library(WatershedR)
?evaluate_watershed

An example input file with 18 genomic annotations and 3 outlier p-values can be found in example_data/watershed_example_data.txt.

Another example input file with 18 genomic annotations and 1 outlier p-value can be found in example_data/river_example_data_pheno_1.txt.

Running Watershed

This package provides two functions useful to users looking to apply Watershed to their data:

evaluate_watershed(): This function is used to train a Watershed model on non-N2 pairs and evaluate model on held-out N2-pairs. This allows the user to get an idea of the accuracy of Watershed applied to their data.
predict_watershed(): This function trains a Watershed model on training data and predicts Watershed posterior probabilities (using Watershed parameters optimized in training) on all gene-individual in a much larger prediction data set.

Both of these functions can be run with three different models:

Watershed_exact: Watershed where parameters are optimized via exact inference (tractable and recommended when the number of dimensions (E) is small. A general rule of thumb is if the number of dimensions (E) is less than equal to 4, exact inference should be used).
Watershed_approximate: Watershed where parameters are optimized using approximate inference. This approach is tractable when the number of dimensions (E) is large. For example, we used this to model the related outlier signals from 49 tissues (see our publication).
RIVER: A previously published method. Used if the number of dimensions (E) is 1.

See the function documentation for details about the outputs. See the Get Started vignette for examples of how to run these functions.

Citation

If you use this R package, please site our publication:

N.M. Ferraro, B.J. Strober, J. Einson, N.S. Abell, F. Aguet, A.N. Barbeira, M. Brandt, M. Bucan, S.E. Castel, J.R. Davis, et al., TOPMed Lipids Working Group, GTEx Consortium, Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 369, eaaz5900 (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
R		R
example_data		example_data
inst		inst
man		man
src		src
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

R

R

example_data

example_data

inst

inst

man

man

src

src

vignettes

vignettes

.Rbuildignore

.Rbuildignore

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

LICENSE

LICENSE

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

Watershed

Installation

Input data

Running Watershed

Citation

About

Releases 1

Packages

Languages

License

nicolerg/WatershedR

Folders and files

Latest commit

History

Repository files navigation

Watershed

Installation

Input data

Running Watershed

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages