Skip to content

A personal projects that uses concepts like web scraping, database management and machine learning that scraps NBA stats basketball from basketball-reference.com, stores into a database and using the database predict who is going to win MVP for a given year

Bryan-DelasPenas/Basketball-Stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basketball-Stats

Description:

This project is a personal project, using concepts of web scraping, database management and machine learning, we can use data from the that can be broken down into three phases and a possible fourth phase.

Current Phase: 3
Phase 1: Web scraping
Phase 2: Creating the database
Phase 3: Predicting the MVP based on player stats

Additional Features:

1) Creating a docker file, and create a docker container for the local database

Phase 1: By using the python libraries beautifulsoup, we can scrap data from basketball-reference.com. There are going to be multiple scrapers that scrap information like season stats, team stats, and finally player stats from 1980 to current date. The scrapers will all output to mutiple csv files in a directory called Output.

Phase 2: By using the mysql workbench and hosting the locally. Taking the csv files from the first phase, we can add it into a database and write queries for certain stats. For example, we can write a query to see who has the total most field goals in a certain year.

Phase 3: Using the database, we can retrieve player stats for a given year. With these stats, we can highlight import features, and thus put it into a random forest function. This function should the import stats, then we can use Linear Regression to predict who won MVP for a given year.

Phase 4: Using flask, creating a web application to for the scrapers, database, and for the mvp prediction. This is an optional phase that may or may not be created.

Setting up the virtual environment using Conda

This is a guide to create a new virtual environment using conda from
https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/.

1) Check if conda is installed and in your path
If conda is installed, this should be your output.

$ conda -V
conda 3.7.0

2) Check if conda is up to date.

$ conda update conda

3) Create a virtual environment for your project,
Where yourenvname is the name of the environment and x.x is the version of python.

$ conda create -n yourenvname python=x.x anaconda

4) Activating your virtual environment.

$ source activate yourenvname 

5) Install additonal Python packages to a virtual environment.

$ conda install -n yourenvname [package]

6) Deactivate and delete your virtual environment.

$ source deactivate                # Deactivate your virtual environment
$ conda remove -n yourenvname -all # Deletes your virtual environment  

Installing

Installing Python Libraries with conda

With the code below run it to get the required libraries to run the scraper.

$ conda install --file requirements.txt

Downloading custom modules Via Github

You can clone this repo and import the libraries at your own discretion.

Running the program

Running the web scrapers

To run each of the scrapers, the code below will show how.

$ python /your/path/BasketBall-Stats/Python_Scrapers/Create_Player_Name # Gets dataframe of player names from 1980 - current 
$ python /your/path/Basketball-Stats/Run_Scraper/get_season_stats.py    # Gets season stats from 1980 - current
$ python /your/path/Basketball-Stats/Run_Scraper/get_team_stats.py      # Gets team stats from 1980 - current
$ python /your/path/Basketball-Stats/Run_Scraper/get_player_stats.py    # Gets player stats from 1980 - current

NOTE: You must run Create_Player_Name first before any other scraper

Where your path is where you decide to store the source directory, Basketball-Stats. NOTE: The file should be ran from the root directory, aka Basketball-Stats.

Web Scrapers Expected Output

Inside of the source directory,Basketball-Stats, there will be a directory called Output.
Inside of Output, should be three directories, corresponding to the name of the scrapers. More information be in a Web_Scraper_User_Manual

Running the Database

IMPORTANT:For any queries, the names of people with accent has been normalized
Example: Nikola Jokić will be Nikola Jokic

TBD

Running the Prediction of MVP

TBD

Running the Django Website

About

A personal projects that uses concepts like web scraping, database management and machine learning that scraps NBA stats basketball from basketball-reference.com, stores into a database and using the database predict who is going to win MVP for a given year

Topics

Resources

Stars

Watchers

Forks