JTIMLmaster

Author: Sangsoon Woo, Consultant, Cytel

Maintainer: Joseph Sayler, Analyst, Cytel

INTRODUCTION

This vignette is created to show how to apply machine learning (ML) approaches in building a predictive model. We will use RNA-seq data set to train and evaluate the performance of different ML models. The ML approaches considered in this training are:

Penalized Regression model (PR)
Random Forest (RF)
Support Vector Machine (SVM)
Neural Network (NN)
Gredient Boosting Model (GBM)

The workflow of the ML application to omics data will be following:

Data pre-processing : filtering, normalization
External feature selection based on univariate modeling
Application of ML to pre-processed data
Selection of predictors
Model comparison based on accurasy measures or misclassification

STUDY

The report was based on two Post-traumatic stress disorder (PTSD) data sets. The PTSD affects 7~8% of the general US population and is higher(up to 20%) among troops returned from the wars in Iraq and Afghanistan. This large difference in incidence rates indicates that life-threatening life experience may perturbate molecular level functions related to mentality which may induce the development of PTSD. Therefore, understanding the molecular mechanisms involved might help to reduce the morbidity and mortality associated with PTSD.

WTC Data

The data is RNA-seq data available in Gene Expression Omnibus under the GEO extension GSE97356. Transcriptome-wide expression study using RNA sequencing of whole blood was conducted in 324 World Trade Center responders. This data was used to build a predictive model for PTSD status and top ranked genes obtained from Penalized regression model are included in the final model which is also used to estimate polygenic score for an independent samples (Marine data).

Marine Data

The data is RNA-seq data available in Gene Expression Omnibus under the GEO extension GSE64813. All subjects in the data were males. Whole blood samples were obtained from 124 MRS II US Marine participants who served a seven month deployment. For each patient, blood was drawn 1 month prior to deployment and again at 3 months after deployment. RNA-seq data was generated from the whole blood samples. The study also includes PTSD status of the patients which will be compared to predicted PTSD status using the predictive model trained in WTC data set. The polygenic score based on predictors was estimated in Marine data sets.

Inspiration

Can you train a machine learning model using WTC data set to accurately predict whether or not the people in Marine data have PTSD or not?

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
R		R
build		build
data		data
inst/doc		inst/doc
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
JTIMLmaster.Rproj		JTIMLmaster.Rproj
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

build

build

data

data

inst/doc

inst/doc

man

man

vignettes

vignettes

.Rbuildignore

.Rbuildignore

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

JTIMLmaster.Rproj

JTIMLmaster.Rproj

LICENSE

LICENSE

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

JTIMLmaster

INTRODUCTION

STUDY

WTC Data

Marine Data

Inspiration

About

Releases 1

Packages

Languages

License

jjsayleraxio/JTIMLmaster

Folders and files

Latest commit

History

Repository files navigation

JTIMLmaster

INTRODUCTION

STUDY

WTC Data

Marine Data

Inspiration

About

Topics

Resources

License

Stars

Watchers

Forks

Languages