Skip to content

edrubin/EC524W20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EC 524, Winter 2020

Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Connor Lennon.

Schedule

Lecture Tuesday and Thursday, 10:00am–11:50am, 105 Peterson Hall

Lab Friday, 12:00pm–12:50pm, 102 Peterson Hall

Office hours

  • Ed Rubin (PLC 519): Thursday (2pm–3pm); Friday (1pm–2pm)
  • Connor Lennon (PLC 430): Monday (1pm-2pm)

Syllabus

Syllabus

Books

Required books

Suggested books

Lecture notes

000 - Overview (Why predict?)

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .Rmd

001 - Statistical learning foundations

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .Rmd

002 - Model accuracy

  1. Model accuracy
  2. Loss for regression and classification
  3. The variance bias-tradeoff
  4. The Bayes classifier
  5. KNN

Formats .html | .pdf | .Rmd

003 - Resampling methods

  1. Review
  2. The validation-set approach
  3. Leave-out-out cross validation
  4. k-fold cross validation
  5. The bootstrap

In-class: Validation-set exercise (Kaggle)

Formats .html | .pdf | .Rmd

004 - Linear regression strikes back

  1. Returning to linear regression
  2. Model performance and overfit
  3. Model selection—best subset and stepwise
  4. Selection criteria

Formats .html | .pdf | .Rmd

005 - Shrinkage methods

  1. Ridge regression
  2. Lasso
  3. Elasticnet

Formats .html | .pdf | .Rmd

006 - Classification intro

  1. Introduction to classification
  2. Why not regression?
  3. But also: Logistic regression
  4. Assessment: Confusion matrix, assessment criteria, ROC, and AUC

Formats .html | .pdf | .Rmd

007 - Decision trees

  1. Introduction to trees
  2. Regression trees
  3. Classification trees—including the Gini index, entropy, and error rate

Formats .html | .pdf | .Rmd

008 - Ensemble methods

  1. Introduction
  2. Bagging
  3. Random forests
  4. Boosting

Formats .html | .pdf | .Rmd

009 - Support vector machines

  1. Hyperplanes and classification
  2. The maximal margin hyperplane/classifier
  3. The support vector classifier
  4. Support vector machines

Formats .html | .pdf | .Rmd

Projects

Intro Predicting sales price in housing data (Kaggle)

Help: Kaggle notebooks

001 KNN and loss (Kaggle notebook)
You will need to sign into you Kaggle account and then hit "Copy and Edit" to add the notebook to your account.
Due 21 January 2020 before midnight.

002 Cross validation and linear regression (Kaggle notebook)
Due 04 February 2020 before midnight.

003 Model selection and shrinkage (Kaggle notebook)
Due 13 February 2020 before midnight.

004 Predicting heart disease (Kaggle competition) | Competition Due 20 February 2020 before midnight.

005 Classifying customer churn (Kaggle competition) | Competition Due In-class 27 February 2020.

Class project Due 12 March 2020 before class.

Lab notes

000 - Workflow and cleaning

  1. General "best practices" for coding
  2. Working with RStudio
  3. The pipe (%>%)

Formats .html | .pdf | .Rmd

001 - dplyr and Kaggle notebooks

  1. Finish previous lab on dplyr
  2. Working in (Kaggle) notebooks
  3. Kaggle contest notes

002 - Cross validation and simulation

  1. Cross-validation review
  2. CV and interdependence
  3. Writing functions
  4. Introduction to learning via simulation
  5. Simulation: CV and dependence

Formats .html | .pdf | .Rmd

Additional R script for simulation

003 - Data cleaning and dplyr

004 - Data cleaning and workflow with tidymodels

005 - Perceptrons and neural nets

Additional Data cleaning in R (with caret)

  • Converting numeric variables to categorical
  • Converting categorical variables to dummies
  • Imputing missing values
  • Standardizing variables (centering and scaling)

Additional resources

R

Data Science

Spatial data

About

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2020 Taught by Ed Rubin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published