Skip to content

A GitHub compiling the input data, Python and Jupyter Notebook scripts, and all relevant statistical outputs from running the AutoMLPipe-BC automated machine learning pipeline (from the Urbanowicz Lab - https://github.com/UrbsLab) on a large-scale single nucleotide polymorphism (SNP) dataset from patients with congenital heart disease (CHD)

Notifications You must be signed in to change notification settings

rdattafl/SNP-Data-Analysis-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNP-Data-Analysis-Project

In this PURM research project, I applied the AutoMLPipe-BC automated machine learning pipeline from the Urbanowicz Lab to conduct a full statistical analysis of a single nucleotide polymorphism (SNP) dataset comprised of SNPs from patients with congenital heart disease. The end goal of the statistical analysis was to identify key SNPs that demonstrated the strongest correlation between feature value and disease outcome. Some of the tasks I was responsible for in this project include significant data wrangling and preprocessing methods in Python 3 and Jupyter Notebook to ready the massive SNP dataset for initial exploratory analyses and for the entire AutoML pipeline to run smoothly. In addition, I simplified the imputation step of the automated ML pipeline from a multivariate to a univariate function. Future extensions to this pipeline include incorporating a multivariate imputation step that iteratively runs through batches of data rather than the full dataset and extending the pipeline to handle more complex, even non-tabular datsets.

About

A GitHub compiling the input data, Python and Jupyter Notebook scripts, and all relevant statistical outputs from running the AutoMLPipe-BC automated machine learning pipeline (from the Urbanowicz Lab - https://github.com/UrbsLab) on a large-scale single nucleotide polymorphism (SNP) dataset from patients with congenital heart disease (CHD)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published