Skip to content

tweichle/Predicting-Baseball-Statistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Predicting-Baseball-Statistics

Classification and Regression Applications in Python Using scikit-learn

This repository contains the prediction of baseball statistics using MLB Statcast Metrics.

ap_mlb_1_stadium

Goals

  • Using MLB Statcast Metrics, summarize and examine baseball statistics.

Classification

  • Build and train models to predict home runs and extra-base hits implementing the following approaches:

    • Logistic Regression
    • k-Nearest Neighbors
    • Decision-Classification Tree
    • Random Forest Classification
    • Support Vector Machine Classification
    • XGBoost Classification
  • Implement over-sampling for imbalanced data to improve the quality of predictive modeling (i.e., generalizability).

  • Apply regularization and cross-validation techniques for model evaluation, selection, and optimization.

Regression

  • Build and train models to predict hit distance implementing the following approaches:

    • Linear Regression
    • Decision-Regression Tree
    • Random Forest Regression
  • Apply regularization (Ridge, Lasso, Elastic Net) and cross-validation (k-fold) techniques for model evaluation, selection, and optimization.