Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added calculate_team_stats() function in R/aggregate_team_stats.R #465

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mscoop16
Copy link

@mscoop16 mscoop16 commented Apr 24, 2024

Overview

Implemented the calculate_team_stats() function as requested in issue #342. The new function is an adapted version of the exisiting calculate_player_stats() function in R/aggregate_game_stats.R. Each of the columns it built up from the passed in pbp dataframe such that the data is consistent with that of the rest of nflfastR.

Changes Made

  • Created file aggregate_team_stats.R
  • Added function calculate_team_stats() to R/aggregate_team_stats.R
  • Added Roxygen2 comment documentation to aggregate_team_stats.R

Context

This feature addresses the request in issue #342. As per this request thread, the values are built ground-up from nflfastR pbp data. In addition, one of the main points in this thread is to incorporate drive-specific statistics. This is implemented in this PR and covered in the details section.

Function Details

The new calculate_team_stats() function was built as an adaptation of the existing calculate_player_stats() function, which uses the "dplyr" package to manipulate the passed-in pbp dataframe and aggregate player-specific statistics. In the new function, this is done on the team level. Most of the columns built by the calculate_team_stats() function are the same as those built by the calculate_player_stats() with a few additions and exceptions which are listed below:

Column Changes

  • No columns built for certain EPA statistics which are more relevant to players than teams (receiving EPA, dakota, ...)
  • No columns built for certain statistics which represent a player's proportion of a teams total (target share, air yards share, WOPR)
  • Columns built for team drive-specific statistics (include but not limited to plays per drive, yards per drive, red zone percentage, percentage of drives ending with a score, etc.)

Testing

As of now, I have only done manual testing using pro-football-reference statistics as ground truth numbers. Excluding discrepancies caused by my implementation details (only notable one is PFR uses team passing yards as passing_yards - sack_yards which I did NOT do) there are only a couple of things to note. For 2023 data, all of the numbers from my initial observations appear to be correct except the per-drive statistics (though this may also be due to how I decided to calculate them vs how PFR does). In older seasons (like 1999), slight discrepancies have appeared. I have done numerous reviews of my own code and am continuing to search for possible bugs.

Notes for Reviewers

  • Any and all feedback is welcome. I am available for discussion in the nflverse Discord, X, or any other preferred communication method.
  • My implementation assumes that each team has the same value for 'posteam' in the entire pbp dataframe (i.e. the team abbreviation is consistent). Code would need to be added to account for when this is not the case if that is the desired functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant