Skip to content
This repository has been archived by the owner on May 21, 2019. It is now read-only.

toddheitmann/mlbgameday

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MLBGameDay

A python API for baseball data working with data sources from MLBAM Gameday data, Baseball Savant, and Retrosheet. Stores data using SQLAlchemy. Returns data in PANDAS data frames.

Data use is always subject to licenses by MLB Advanced Media License, retrosheet, and the project license.

Status

This currently partially works. Expect continued updates and changes to database structure and data models. If you're curious about using MLBAM gameday data, the best source is PitchRx. Obviously, it's in R, so you'll need to check out learning R, or find another option if python is your thing.

Goals

The project has two simple main goals:

  • Provide a database storage for baseball data using SQLAlchemy.
  • Serve this data back for analysis in dataframes using PANDAS.

Roadmap

Right now, I see three main paths to bring this project online, reading and storing downloaded data, creating the database structure, and serving queries in dataframes:

  • Using Gameday XML data

    • Store XML files

    • Parse XML files

    • Update XML files

    • Format XML files for database insertion

    • Option to delete files after inserting into a database

  • Using Baseball Savant data

    • Store and Parse CSV files

    • Delete files after inserting into a database

    • Insert Into database

    • Update Baseball Savant Trajectory Data

  • Using Retrosheet data

    • Download event files

    • Parse event files using chadwick

      • Windows: include chadwick executables and call to parse

      • Mac: Require installation via homebrew:

        brew install chadwick
      • Linux: Provide installation instructions

    • Store data in database

    • Update database with new data

    • Delete files after insertion

  • Create and Maintain Database

    • Create database structure

    • Create database relationships

    • Create database from fresh install

    • Update database

    • Join different databases (MLBGameDay, Baseball Savant, Retrosheet)

  • PANDAS integration

  • Serve initial queries into dataframes

Why?

Being a pythonista, I'm slightly jealous of the regularly updated PitchRx CRAN package. This will hopefully provide an alternate for use in python development.

Stretch Goals

While getting initial functionality, I hope to provide added support for:

  • different database type (MySQL, PostgreSQL, etc...)
  • external data such as travel distances, weather information, etc...
  • OpenWAR, cFIP / DRA, or other advanced metrics

Thank You

Thanks to MLB Advanced Media for making gameday and pitchf/x data public.

Thank you Daren Willman for creating baseball savant.

Many thanks to all those who support and add to Retrosheet!

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

About

A python API for MLBAM gameday data. Stores data using SQLAlchemy. Returns data in PANDAS data frames.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages