Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approaches for games with missing box scores/pbp #46

Open
john-b-edwards opened this issue Nov 12, 2021 · 0 comments
Open

Approaches for games with missing box scores/pbp #46

john-b-edwards opened this issue Nov 12, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@john-b-edwards
Copy link
Collaborator

So something I noticed: several games that are missing team box scores and player box scores do have PBP data available. For example:

hoopR::load_mbb_schedule(2003:2021) |>
    dplyr::filter(game_id == '290722440') |>
    dplyr::select(PBP, team_box, player_box)
#> # A tibble: 1 x 3
#>   PBP   team_box player_box
#>   <lgl> <lgl>    <lgl>     
#> 1 TRUE  FALSE    FALSE

Additionally, these games do have box scores available, just not on ESPN. For the above game, for example, one can find the box score on Louisiana Tech's website here.

Some games are altogether missing PBP data as well, which makes incorporating them into hoopR difficult.

I am curious if there would be a good way to either parse the existing data or hunt down missing play by play data and incorporate it into hoopR. This is obviously both a long-term and ongoing endeavor, so I am raising it as a suggestion for additional work to improve the scope of hoopR.

Three approaches seem wise:

  1. Write a function to parse existing hoopR PBP data into box-score format. Either do this on an ad-hoc basis or incorporate it into the existing data repositories.
  2. Come up with a guide contributors to add PBP/box score data and write tests for them to merge it into the hoopR data repository (ex. must be in .json format, must contain these values, and so on). Contributors can add games to the hoopR repository as they come across them.
  3. Explore alternate routes for acquiring PBP/box score data. Statbroadcast may be a good place to start, as well as team websites, though scraping individual team websites is a fairly daunting task beyond the scope of the sportsdataverse.

I have no strong thoughts on this, simply vomiting thoughts on hoopR out into the void. Additionally, this is by no means a pressing issue or anything that takes a quick fix. Would simply be curious as to how valuable acquiring even more data would be for hoopR, or how much we can squeeze out of leveraging the data we already have access to -- just hoping to spark some discussion on this topic!

@armstjc armstjc added the enhancement New feature or request label Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants