KOBE-SHOT-SELECTION

Understanding Data

The given data contains the location and circumstances of every basket attempted by Kobe Bryant during his 20-year career.Our task is to predict whether Kobe was successful with his shot or not, this is represented by shot_made_flag with 1 meaning shot is successful and 0 meaning missed shot.Data consists a total of 30,697 records with 25 columns consisting of different fields and out of these 30,697 records 5000 records are given to us for testing i.e. with blank shot_made_flag.25 columns of data are fields such as action_type, combined_shot_type, game_event_id, game_id,lat, loc_x, loc_y, lon, minutes_remaining, period, playoffs, season , seconds_remaining, shot_distance, shot_made_flag, shot_type, shot_zone_area, shot_zone_basic, shot_zone_range, team_id, team_name, game_date, matchup, opponent, shot_id. Of these all fields only some of them are important to decide whether the shot is successful or not. Selection of fields important in predicting the shot(Feature Engineering) is documented in "featureSelection.py" along with some plots in "Plots" folder and is self-explanatory.

Building Model with XGBoost

I choose XGBoost(eXtreme Gradient Boosting) algorithm. It works on the principle of ensemble, which combines the prediction of multiple trees together. It’s a highly sophisticated algorithm, an advanced implementation of gradient boosting algorithm and is powerful enough to deal with all sorts of irregularities(variance & bias) in data. The implementation of the model supports the features of the scikit-learn (Python) and R implementations, with new additions like regularization.

Tuning parameters

XGBoost algorithm has 3 types of parameters : General, Booster, Learning Task parameters Parameter tuning is done using GridSearchCV with following values. max_depth=[6,7,8], min_child_weight=[3.5,4,4.5], colsample_bytree=[0.6,0.65,0.7],subsample = range(0.6,0.7,0.2)(5 values), reg_alpha=[3,4,5] with learning_rate=0.012 and n_estimators=1000. Best params for our model after performing GridSearchCV are : {'reg_alpha': 5, 'colsample_bytree': 0.6,'min_child_weight': 4, 'subsample': 0.62} leading to an log_loss of 0.60098 and over-all cross-validation score(accuracy) with 5-folds being 68.11%

How to run the code

Pre-requisites

Numpy : pip install numpy
Pandas : pip install pandas
Scipy : Download appropriate wheel matching your python version and os from here and type pip install name_of_wheel_file. Need to install numpy+mkl package first.
xgboost : Follow this link
scikit-learn : pip install scikit-learn

Usage

python main.py path-to-data-file

Module Details

featureSelection.py contains information about discarded and considered fields from data with appropriate plots in folder named 'Plots'.
plot.py contains plotting functions which are used by featureSelection.py
parameterTuning.py implements GridSearchCV with specified params and returns the best_params
Metrics.txt contains accuracy for each fold with final parameters
xgb.pickle.dat contains the trained model which can be loaded directly instead of training model again.(Total time taken to train the model: 1100s)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plots

Plots

README.md

README.md

featureSelection.py

featureSelection.py

main.py

main.py

metrics.txt

metrics.txt

parameterTuning.py

parameterTuning.py

plot.py

plot.py

result_xgb_final.csv

result_xgb_final.csv

xgb.pickle.dat

xgb.pickle.dat

Repository files navigation

KOBE-SHOT-SELECTION

Understanding Data

Building Model with XGBoost

Tuning parameters

How to run the code

Pre-requisites

Usage

Module Details

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Plots		Plots
README.md		README.md
featureSelection.py		featureSelection.py
main.py		main.py
metrics.txt		metrics.txt
parameterTuning.py		parameterTuning.py
plot.py		plot.py
result_xgb_final.csv		result_xgb_final.csv
xgb.pickle.dat		xgb.pickle.dat

Rakesh-Kasa/KOBE-SHOT-SELECTION

Folders and files

Latest commit

History

Repository files navigation

KOBE-SHOT-SELECTION

Understanding Data

Building Model with XGBoost

Tuning parameters

How to run the code

Pre-requisites

Usage

Module Details

About

Topics

Resources

Stars

Watchers

Forks

Languages