Skip to content

Latest commit

 

History

History
27 lines (22 loc) · 2.44 KB

File metadata and controls

27 lines (22 loc) · 2.44 KB
layout root permalink
lesson
.
index.html

High dimensional datasets, or tabular data with many features describing each observation of a dataset, are increasingly commonplace in many research domains. How can researchers find patterns and extract insights from such complex and information-rich data? In this workshop, we will explore several tried and true methods that can help data analysts better understand their high dimensional data including: principal component analysis, data visualization, and regularized multivariate regression. As a result of participating in this workshop learners should be able to…

  • Define, identify, and give examples of high dimensional datasets
  • Visualize and explore high-dimensional data to reveal a research story
  • Use dimensionality reduction techniques such as PCA to yield useful abstractions/summaries of complex, high dimensional data
  • Understand the challenges associated with fitting both predictive (e.g., overfitting) and explanatory (e.g., avoiding multicolinearity) regression models to high dimensional datasets
  • Optimize high dimensional multivariate regression models for either predictive or explanatory purposes via a combination of techniques including: PCA, feature selection, and regularization techniques (lasso, ridge, and elastic net)
  • TODO: (1) How to navigate the common pitfalls of clustering in high-dimensions, (2) High-dim visualization tools including PacMAP and t-SNE

{% comment %} This is a comment in Liquid {% endcomment %}

Prerequisites

Learners are expected to have the follow prerequisite knowledge:

  • Introductory Python programming skills (variable assignments, how to create a function, for loops, etc.) and familiarity with the Pandas package. If you need a refresher on Python before taking this workshop, please review the lesson materials from this Introductory Python Carpentries workshop.
  • Familiarity with basic machine learning concepts including train/test splits and overfitting. For a refresher on machine learning basics, please review the lesson materials from the Intro to Machine Learning with Sklearn workshop

{: .prereq}

{% include links.md %}