General Assembly DSI - Denver 2018

Capstone Project - DFS Model

This is my capstone project at General Assembly's fifth Data Science Immersive cohort in 2018. I am developing a model to assist in optimizing NFL lineups on the daily fantasy sports platforms Draft Kings and Fan Duel.

Problem Statement

Can we predict an NFL player’s fantasy football performance to determine if they are a valuable DFS play?

Executive Summary

The DFS (Daily Fantasy Sports) industry has exploded since its inception in 2011 due to the popularity of fantasy sports and the prospects of financial windfall for those who partake in the contests on various platforms. FanDuel and DraftKings, the two largest stakeholders in the field, are now both valued at over a billion dollars. Despite the vast amounts of money they continue to rake in through entry fees for their contests, the average player is not consistently winning. The top players who actually make a living on long-run returns are typically people who utilize some form of statistical model to help identify fantasy players and lineups that the "average" player might not consider. It is this ability to identify valuable fantasy players that makes them so effective, and you can usually trace their success back to some combination of data science and subject matter expertise. My goal is to build a model that can outpredict fantasy output over the websites that average players receive their advice from (ESPN, CBS, Yahoo, etc.).

For my analysis I decided to focus strictly on FanDuel for the sake of simplicity, and considering that the results for DraftKings would likely be very similar. I began by gathering data on FanDuel player salaries and points history dating back to 2011, and collected up to the end of 2017. One big assumption I made in doing so was that player performances are independent from one season to the next, but that those within the same year would likely be indicative of a player's fantasy output that year. Due to this, I did not collect data for the current 2018 season as I do not yet have a full season's worth of data to work with. I also collected weekly ranking and statistical information about team defenses, weather data, and betting lines to merge with the player statistical information.

One important thing to note is that the target variable, once merged, is in the same row as the statistics that produce it. If a player has a touchdown, that will be baked into the number of fantasy points that same week. Ideally we would like to implement this model in the future where we try to predict and unknown target variable given a player's past performance. Therefore, we needed to shift the data so that we were using past information to predict something that happened in the future. Shifting the dataframe results in the loss of a decent amount of observations unfortunately, and there is already a severe lack of data for the NFL. This lack of data is something that is inherent in the nature of football unfortunately, as the average career of a player is 3.3 years and there are only 16 games in a season. Slightly more than 50 observations for an average player is not a lot of predictive power, and unfortunately it is probably less than this amount due to things like injury that prevent them from playing.

After successfully merging the data and properly formatting every feature, I grouped the dataset on a six-level multi-index by a player's Name, Year, Week, Month, Team, and Opponent. This ensures that the when fed into a model, it learns informatoin about each individual player as opposed to trying to generalize information about all quarterbacks. This makes sense, as you would not want to compare how Drew Brees played in a given set of circumstances compared to Paxton Lynch. During my research phase of the project, I discovered a thesis titled PREDICTING A QUARTERBACK’S FANTASY FOOTBALL POINT OUTPUT FOR DAILY FANTASY SPORTS USING STATISTICAL MODELS by Nicholas A. King. This paper heavily influenced my process for modeling and exploring my data.

Doing this analysis really allowed me to see first hand the difficulty in making predictions in an environment highly prone to human error. For example, the multicollinearity between features suggested that a linear-based model would not perform well, which it did not. This makes sense, as you would not expect there to be a one-to-one relationship between changing a variable and the target variable. In football - and in life - when one thing changes, many others tend to go awry as well. This is also illustrated by the fact that the R2 scores were extremely low for all models. R2 scores explain the variance in a dataset, and since there is so much random error in human activity it would be impossible for a model to explain a high amount of variability in observations.

Due to the inherent variability of human error, the models were evaluated based on the RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error). Instead of measuring variability, these metrics display how far off the average predicted fantasy points were compared to the actual points a player scored. This is a much more intuitive and interpretable metric to use in this application. I tested a linear regression, random forest regressor, support vector regression, ADA boosting regressor, gradient boosting regressor, principal components regressor, and neural network to make my predictions. Each position had a slightly different level of accuracy, tight ends being the most accurate and quarterbacks the least. In general, the models were able to get between 4 and 8 points of a player's actual performance, depending on the metric being used. Ultimately, there is still a great deal of room for improvement that I think starts with being able to find and/or manufacture more data. In addition, there should be more features to consider through the use of feature engineering or seeking out other datasets, especially predictions from "experts". A great deal of time was spent on gathering, cleaning, and understanding the disparate datasets that were combined for this analysis. Moving forward, I intend to spend much more time developing new features and finely tuning the models predictive ability.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
data		data
notebooks		notebooks
research		research
workbooks		workbooks
.DS_Store		.DS_Store
Capstone Presentation.pdf		Capstone Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

research

research

workbooks

workbooks

.DS_Store

.DS_Store

Capstone Presentation.pdf

Capstone Presentation.pdf

README.md

README.md

Repository files navigation

General Assembly DSI - Denver 2018

Capstone Project - DFS Model

Problem Statement

Executive Summary

About

Releases

Packages

Languages

charley-dixon/fanduel-predictions

Folders and files

Latest commit

History

Repository files navigation

General Assembly DSI - Denver 2018

Capstone Project - DFS Model

Problem Statement

Executive Summary

About

Resources

Stars

Watchers

Forks

Languages