Skip to content

Potential case studies (i.e., existing results, do the same thing easier using the container)

Aedin Culhane edited this page Jun 18, 2015 · 22 revisions

MethylMix clustering

(Tim Triche) http://genomebiology.com/2015/16/1/17 (this is also an excuse for me to resurrect SciDB as a SummarizedExperiment backend; see related page)

FEM validation

(Tim Triche) http://bioinformatics.oxfordjournals.org/content/30/16/2360.short (in my own experience, further integration of mutational/fusion covariates into FEM works VERY well)

TCGA AML p53 deletion/mutation

(Tim Triche) integration/verification using 450k methylation clustering: http://www.nejm.org/doi/full/10.1056/NEJMoa1301689
(pretty sure we dropped this into the supplement... maybe I had better check, or just dig out the code)

Obviously with a decent 450k/CNV + RNAseq/SNV calling pipeline, something like LAML becomes more trivial. It does however continue to illustrate the value of keeping sample identifiers strictly enforced during preprocessing, and verifying suspicious mappings (or mismappings) as part of initial QC. The fact that SNPs are used as a "barcode" for 450k data provides a way to label-verify, at the very least, CNV and DNA methylation; when SNPs are called in WGS and/or RNAseq, those can also be used for positive matching. I really have no idea if anyone else does this at the present time, but IMHO everyone should. The approach is also relatively easily extended to WGBS and ATACseq label mapping (given a set of SNP calls).

Will update with code as I assemble it --tjt, 3/26/2015

multi-assay quality control for ovarian cancer

(Levi Waldron) Reproducible vignette demonstrating potential effectiveness, using ExpressionSets, here.

Plan to expand greatly to screen TCGA.

EGFR exon vIII analysis

(Markus Riester) Displaying whole-gene expression and copy number, with one-exon mutation, in a boxplot. R script here.

Publicly available perturbation data

Both of which are part of the larger CTD2 network:

MOGSA

(Aedin Culhane)

MOGSA extends Culhane et al., 2003 and Meng 2012 to provide an new approach for single sample gene set enrichment analysis of multiple datasets.

Given multiple 'omics datasets with matched cases and diverse features. The features in each dataset can be different, and non-overlapping. The datasets are projected into the same principal component (PC) space using multiple factor analysis or multiple coinertia analysis. Thus all of the features are now transformed onto the same scale and can be concatenated into a giant table of mixed features, that can be ranked by their contribution to each principal axes.

Given tables of feature annotation, whereby the features (rows) matches those in the data tables and the columns are gene sets. These tables can be binary or weight matrices which reflect the presence/absence of a feature (gene) in a gene set/pathway. We can project the annotation on the same principal component space by simple matrix multiplication. We now have a matrix of genesets x PCs. We can extract a matrix of genesets x samples by crossing this matrix with the samples x PCs matrix.

Fig1. MOGSA Algorithm

When we tested using simulated data. MOGSA outperforms other ssGSA methods (native matrix multiplication, GSVA and ssGSEA) when tested on simulated data.

Fig2. MOGSA performance

For more info, see our Bioconductor Package http://bioconductor.org/packages/release/bioc/html/mogsa.html