Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Plays in NFLFastR #35

Open
jacole3 opened this issue Jul 10, 2020 · 6 comments
Open

Missing Plays in NFLFastR #35

jacole3 opened this issue Jul 10, 2020 · 6 comments
Labels
NFL data issue Error in the underlying data, not a bug wontfix This will not be worked on

Comments

@jacole3
Copy link

jacole3 commented Jul 10, 2020

Via email communication with Ben Baldwin, I've learned that one of the plans for the next NFLFastR update is to locate and fix as many of the missing plays as possible. Many, but not all, that I've found come in the 1999 season. Here's a non-comprehensive list of what I've found so far, and anyone who finds more is welcome to pitch in to help Ben and his colleagues out.

# Two plays are missing here shortly after onside kick recovery
View(pbp %>% filter(game_id == "1999_01_DAL_WAS", qtr == 4))

# One play missing here, after Batch is sacked
View(pbp %>% filter(game_id == "1999_16_DEN_DET", qtr == 4))

# Missing play here, after a McNown completion
View(pbp %>% filter(game_id == "1999_08_CHI_WAS", qtr == 4))

# Missing play here, after completion to Metcalf late in qtr
View(pbp %>% filter(game_id == "1999_13_STL_CAR", qtr == 4))

# Missing play here, after incompletion from Plummer to Moore
View(pbp %>% filter(game_id == "1999_16_ARI_ATL", qtr == 4))

# Missing play here, after incompletion to Bates
View(pbp %>% filter(game_id == "1999_16_CHI_STL", qtr == 4))

# Missing play here, after incompletion to Batten
View(pbp %>% filter(game_id == "1999_16_MIN_NYG", qtr == 4))

# Missing play here, after Walsh completion on 3rd down
View(pbp %>% filter(game_id == "1999_17_IND_BUF", qtr == 4))

# Two missing plays here, after late McNown incompletion
View(pbp %>% filter(game_id == "1999_17_TB_CHI", qtr == 4))

# Missing play after a Frerotte incompletion
View(pbp %>% filter(game_id == "1999_18_DET_WAS", qtr == 4))

# Missing play after a late McNown completion
View(pbp %>% filter(game_id == "2000_06_NO_CHI", qtr == 4))

# Missing play here, after a Kordell Stewart completion
View(pbp %>% filter(game_id == "1999_01_PIT_CLE", qtr == 1))

# Missing play here, after 19-yard Smith completion (8:30)
View(pbp %>% filter(game_id == "1999_04_STL_CIN", qtr == 3))

# Missing play here, shortly before 8:22 FG (right before Dillon run)
View(pbp %>% filter(game_id == "1999_05_CIN_CLE", qtr == 1))

# Missing play here, right after 11:25 DPI
View(pbp %>% filter(game_id == "1999_08_TB_DET", qtr == 4))

# Missing play here, right before Garrett delay of game
View(pbp %>% filter(game_id == "1999_11_DAL_ARI", qtr == 2))

# Missing play here, before the 4:27 DPI on Favre pass
View(pbp %>% filter(game_id == "1999_13_GB_CHI", qtr == 2))

# Missing play here after 23-yard completion to Rice (8:07)
View(pbp_Original %>% filter(game_id == "1999_13_SF_CIN", qtr == 4))

# Missing play here right before final play of 1st qtr
View(pbp %>% filter(game_id == "1999_14_ARI_WAS", qtr == 1))

# Missing play here, right before 2:00 Makovicka completion
View(pbp %>% filter(game_id == "1999_14_ARI_WAS", qtr == 4))

# Missing play after Flutie 4:17 pass to Thomas
View(pbp %>% filter(game_id == "1999_14_NYG_BUF", qtr == 1))

# Missing play right before 11:02 encroachment on Chargers
View(pbp %>% filter(game_id == "1999_14_SD_SEA", qtr == 1))

# Missing play right before 3:02 NE ineligible man downfield
View(pbp %>% filter(game_id == "1999_15_NE_PHI", qtr == 2))

# Missing play right after 11:54 Peyton to Edgerrin completion
View(pbp %>% filter(game_id == "1999_15_WAS_IND", qtr == 3))

# Missing play right after 13:40 Chandler-to-Mathis completion
View(pbp %>% filter(game_id == "1999_17_SF_ATL", qtr == 2))

# Missing play right before 6:22 Warren run
View(pbp %>% filter(game_id == "1999_18_DAL_MIN", qtr == 4))

# Missing play right after 6:20 Chandler-to-Mathis completion
View(pbp %>% filter(game_id == "2000_01_SF_ATL", qtr == 1))

# Missing play right after 9:19 Davis run
View(pbp %>% filter(game_id == "2000_03_DAL_WAS", qtr == 4))

# Missing play (a spike) right before final offensive play of half
View(pbp %>% filter(game_id == "2003_02_PIT_KC", qtr == 2))

# Missing play (a spike) right before Kaeding's miss
View(pbp %>% filter(game_id == "2006_19_NE_SD", qtr == 4))

@guga31bb
Copy link
Member

Thanks!

I'm wondering how many of these are still present in the new version of nflfastR. I know the first two should be fixed but haven't taken a look at the others.

@jacole3
Copy link
Author

jacole3 commented Jul 10, 2020

@guga31bb I've gone through my list, and I still see the following seven errors. Keep in mind that my original list of 30 wasn't comprehensive, just the ones that I had come across over the past few days.

# One play missing here, now the 2nd down right BEFORE Batch is sacked, 13:27
View(pbp %>% filter(game_id == "1999_16_DEN_DET", qtr == 4))
# This one is really ironic, because in version 2.1.0, the 4th down right after the sack was missing.

# Missing play here, shortly before 8:22 FG (right before Dillon run)
View(pbp %>% filter(game_id == "1999_05_CIN_CLE", qtr == 1))

# Missing play here, right after 11:25 DPI
View(pbp %>% filter(game_id == "1999_08_TB_DET", qtr == 4))

# Missing play right before 3:02 NE ineligible man downfield
View(pbp %>% filter(game_id == "1999_15_NE_PHI", qtr == 2))

# Missing play right after 13:40 Chandler-to-Mathis completion
View(pbp %>% filter(game_id == "1999_17_SF_ATL", qtr == 2))

# Missing play right before 6:22 Warren run
View(pbp %>% filter(game_id == "1999_18_DAL_MIN", qtr == 4))

# Missing play right after 9:19 Davis run
View(pbp %>% filter(game_id == "2000_03_DAL_WAS", qtr == 4))

@guga31bb
Copy link
Member

Illustration of problem:

id = "1999_18_DAL_MIN"
g <- get_pbp_gc(id) %>%
  add_game_data() 
g %>% select(play_description, game_id, play_id, down, yards_to_go, quarter, time, drive)%>%View

image

This is really hard because the duplicate description, ID, and time make this impossible to distinguish from a duplicate play (and we need to drop dups to fix other games). The good news is that this is only a problem in older seasons, but it is annoying.

@jacole3
Copy link
Author

jacole3 commented Jul 11, 2020

@guga31bb Interesting, that makes sense. I definitely noticed that the majority of omitted plays were incompletions to a receiver who already had an incompletion thrown to him at some other point in the same drive. So I guess that means there's no way to fix it besides going through and adding all the missing ones by hand.

If I find any more besides the seven mentioned in my last comment, I'll share them here. Hopefully we can detect most, or all, of them by the time Version 2.1.2 drops.

@CroppedClamp
Copy link

I'm wondering how these match up with the errata linked here: https://github.com/CroppedClamp/nflscrapR-data/tree/master/errata. There are some cases where plays are listed out of order, and some where the stats are wrong. I think there are few enough that it could be corrected by hand though.

One way to further verify these is to run the stats that are aggregated from the PBPs here against the official NFL stats, which I have done a couple times. I have noticed similar small issues, maybe 3 or 4 plays off for the whole season, and definitely some that are out of order. In some cases, these are even incorrect in the official NFL GSIS feed, the XML feed, and also the gamebooks (this seems to happen a lot on kickoff return fumbles). I don't see a great way of computationally coalescing these other than to find the error programmatically and then fix it by hand, since it is variable in a number of different sources.

Let me know how I can help in this area as well, if I can

@jacole3
Copy link
Author

jacole3 commented Jul 11, 2020

@CroppedClamp Good thought to look back at the NFLScrapR errors, there's gotta be some overlap with the 2009-and-later seasons in terms of these types of mistakes. At first glance, I found some mutual mistakes in the 2011 Lions-Saints game, involving some plays being out of order in the late 2nd and early 3rd quarters. So it looks like that link could help us find further NFLFastR errors.

Unfortunately, even with that knowledge, it seems like fixing whatever errors we find by hand is the only way to go. Though I'm sure Ben and Sebastian have better insight on that. In any case, we can help them out by posting whatever missing plays we do find in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NFL data issue Error in the underlying data, not a bug wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants