Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies with POSTEAM #33

Open
awgymer opened this issue Jul 7, 2020 · 2 comments
Open

Inconsistencies with POSTEAM #33

awgymer opened this issue Jul 7, 2020 · 2 comments
Labels
NFL data issue Error in the underlying data, not a bug wontfix This will not be worked on

Comments

@awgymer
Copy link

awgymer commented Jul 7, 2020

Just looking through the data and after running update_db and inspecting the whole dataset I discovered that there are three plays with play_id = 2323 game_id = 2000_11_OAK_DEN where the name is listed as T.Davis (Terrell Davis) but the posteam is LV.

I'm not sure what the cause of this is, or if it affects other plays/games.

@guga31bb guga31bb added the NFL data issue Error in the underlying data, not a bug label Jul 7, 2020
@guga31bb
Copy link
Member

guga31bb commented Jul 7, 2020

This is an example of duplicated play_id in older seasons which causes errors because we don't know which stat IDs are supposed to correspond to which play. I wonder if we should just remove this game from nflfastR like we had to with a couple others in this season:

g <- fast_scraper('2000_11_OAK_DEN') %>%
  clean_pbp()

g %>% 
  filter(play_id == 2323) %>%
  select(posteam, desc, name)

# A tibble: 4 x 3
  posteam desc                                                                                    name    
  <chr>   <chr>                                                                                   <chr>   
1 LV      (8:49) B.Griese pass to E.McCaffrey to OAK 4 for 2 yards (G.Biekert).                   B.Griese
2 LV      (8:06) T.Davis right guard to OAK 5 for -1 yards (R.Coleman, T.Bryant).                 T.Davis 
3 LV      (7:29) J.Elam 23 yard field goal is GOOD, Center-M.Lepsis, Holder-T.Rouen.              T.Davis 
4 LV      J.Elam kicks 53 yards from DEN 30 to OAK 17. D.Dunn to OAK 29 for 12 yards (G.Coghill). T.Davis 

@awgymer
Copy link
Author

awgymer commented Jul 8, 2020

I see. Just ran a quick check and it looks like this issue affects 9 plays across 8 games:

> pbp[,.N, by="game_id,play_id"][N>1]
           game_id play_id N
1: 2000_03_PIT_CLE    2768 2
2: 2000_03_PIT_CLE    2767 2
3: 2000_06_WAS_PHI    1825 4
4: 2000_11_OAK_DEN    2323 4
5: 2002_03_DAL_PHI    3635 2
6:  2002_05_KC_NYJ    1020 2
7: 2005_04_SEA_WAS    2861 3
8:  2006_10_TB_CAR     103 3
9: 2007_08_IND_CAR    4382 9

Personally I'd typically be inclined to try and keep as much data as possible with exceptions/workarounds but I don't really know how convoluted that becomes for this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NFL data issue Error in the underlying data, not a bug wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants