Skip to content

Applying Machine Learning Ras, NF1, and TP53 Classifiers to PDX model gene expression

License

Notifications You must be signed in to change notification settings

marislab/pdx-classification

Repository files navigation

Applying Machine Learning Classifiers to Pediatric Patient Derived Xenograft Expression Data

Gregory Way, Jo Lynne Harenza, John Maris, 2018

DOI

Here, we apply a Ras activation, an NF1 inactivation, and a TP53 inactivation classifier to Target Patient Derived Xenograft (PDX) RNAseq data. The classifiers were previously trained using data from The Cancer Genome Atlas (TCGA) PanCanAtlas Project (Way et al. 2018, Knijnenburg et al. 2018)

Computational Environment

We use conda as an environment manager. To reproduce the computational environment used in this pipeline, run:

# Using conda version >4.5
conda env create --force --file environment.yml

conda activate expression-classification

Pipeline

The following notebooks describe the analysis pipeline

Notebook Description
1.apply-classifier.ipynb Apply the classifiers trained previously on the input data
2.evaluate-classifier.ipynb Investigate and evaluate the prediction performance and score distribution for input data
3.explore-variants.ipynb Explore the classifier predictions across genes, variants, and outliers

To rerun all scripts, perform the following:

# First, download the gene expression and alterations data
./download_data.sh

# Make sure to activate the conda environment
conda activate expression-classification

# Run the pipeline to extract results, figures, and convert notebooks for easy viewing
./run_analysis.sh