Skip to content
This repository has been archived by the owner on Aug 12, 2023. It is now read-only.

madhurima-nath/regression_and_predictions

Repository files navigation

WWCode Data Science: Statistics Workshop Series - Statistics in Data Science

Week 4: Regression and Predictions

The following topics are covered in the session.

  • Simple Linear Regression
  • Multiple Linear Regression
  • Real-world Example
  • Factor variables in Regression
  • Regression Diagnostics - Outliers, influential values, Correlated errors
  • Ridge and Lasso Regression
  • Polynomial and Spline Regression

The New York air quality data (details about dataset) is used for the analysis in the R notebook. Download data from Kaggle here.

Note: When the HTML file is downloaded, if it appears in its raw version, please follow the following steps to make it readable.

  • Download the HTML file in the browser
  • Right click anywhere on the page of the HTML file and select 'Save As' from the options
  • Save the file on the local machine as File Type 'HTML'
  • Once saved, open that file and this will open the correctly formatted HTML notebook

Link to the Slides

YouTube link

Feel free to reach out if you have any questions.

References:

  1. Penn State STAT 462 Applied Regression Analysis
  2. Khan Academy
  3. Cornell Machine Learning Course
  4. CMU Data Analytics Course
  5. Measures of Influence
  6. UCLA Course - Coding Systems for Categorical Variables
  7. Ridge Regression Example in R