Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Penalty Infractions on the Same Play Obscure the True Penalty Count and True Penalty Yardage #220

Open
JoeMarino2021 opened this issue Mar 12, 2021 · 8 comments

Comments

@JoeMarino2021
Copy link

Consider the following three plays from 2020. They all involve multiple penalty infractions on the same play.

play_id | game_id | desc | penalty_team | penalty_yards | penalty_player_name
2800 | 2020_15_TB_ATL | (1:45) 12-T.Brady pass incomplete short right to 13-M.Evans. PENALTY on ATL-24-A.Terrell, Defensive Pass Interference, 13 yards, enforced at ATL 45 - No Play. PENALTY on ATL-24-A.Terrell, Face Mask (15 Yards), 15 yards, enforced between downs. | ATL | 15 | A.Terrell
1686 | 2020_15_LAC_LV | (14:55) 64-C.Toner reported in as eligible. 10-J.Herbert pass incomplete short right to 15-J.Guyton. PENALTY on LV-27-T.Mullen, Defensive Pass Interference, 13 yards, enforced at LAC 25 - No Play. PENALTY on LV-27-T.Mullen, Face Mask (15 Yards), 15 yards, enforced between downs. | LV | 15 | T.Mullen
2559 | 2020_17_ARI_LA | (5:21) (Shotgun) 15-C.Streveler pass deep middle to 10-D.Hopkins to LA 10 for 40 yards (43-J.Johnson III). PENALTY on ARI-10-D.Hopkins, Offensive Pass Interference, 10 yards, enforced at 50 - No Play. PENALTY on ARI-10-D.Hopkins, Unsportsmanlike Conduct, 15 yards, enforced between downs. | ARI | 15 | D.Hopkins

The first case shows 28 yards of penalties on ATL, but only 15 yards are shown in the "penalty_yards" column. The second shows Mr. Mullen committed 28 yards of penalties on behalf on LV, but only 15 yards are shown in "penalty_yards". The third shows Nuke Hopkins with 25 yards of penalties, but "penalty_yards" only shows 15 yards.

These occurrences are rare, but when they happen, they make counting penalties and penalty yards in a game difficult. I am using the "penalty_yards" field and a filter on "penalty_team" to sum up the penalty yardage assessed to a team. This means my penalty yardage totals for these three games comes up short. I use the nrow() function and a filter on "penalty_team" to count the number of penalties on a team. This means my penalty counts come up short, as well, since in these cases more than one penalty is represented by each row.

Might I humbly suggest a set of fields called home_num_penalties, home_penalty_yardage, away_num_penalties, and away_penalty_yardage (or something similar) to make obtaining penalty counts and penalty yardage per team easier? Or if a way to accomplish this task with the current data set is possible, I would love to hear about it.

Thanks so much for your consideration, and for providing the best NFL data collection tool. Have a wonderful day!

@JoeMarino2021
Copy link
Author

When do you guys usually release a new version to CRAN? I am eager to try out the fixes that have just been rolled out.

Is it also true that some of the same guys have created a thing called nflseedR? When will that make it to CRAN?

Thanks for all of the awesome!

@mrcaseb
Copy link
Member

mrcaseb commented Mar 15, 2021

When do you guys usually release a new version to CRAN? I am eager to try out the fixes that have just been rolled out.

There is no typical timeline for CRAN release. But we plan to bring an update in the next days.
However, it's always possible to get the latest update by loading it from GitHub with

if (!require("remotes")) install.packages("remotes")
remotes::install_github("mrcaseb/nflfastR")

Is it also true that some of the same guys have created a thing called nflseedR? When will that make it to CRAN?

That's true. Lee Sharpe and I developed nflseedR. We wanted to give the users the chance to play with it and find bugs prior to a CRAN release. Installation is analogue to the above

if (!require("remotes")) install.packages("remotes")
remotes::install_github("leesharpe/nflseedR")

@JoeMarino2021
Copy link
Author

I am going to leave myself a note here and hope that maybe this helps others out while we wait for the nflfastR creators to work their magic. Heck, maybe even the creators will benefit from this a little bit.

So, first things first. Let's run a line of code that will locate all of the plays with double penalties. The second boolean test is to eliminate all of the plays with successful replay reviews that cause the NFL pbp to repeat the play text, which will duplicate the PENALTY notification.
pbp2020$desc[grepl("PENALTY.+enforced.+PENALTY.+enforced", pbp2020$desc) & !grepl("REVERSED", pbp2020$desc)]

[1] "(14:55) 64-C.Toner reported in as eligible.  10-J.Herbert pass incomplete short right to 15-J.Guyton. PENALTY on LV-27-T.Mullen, Defensive Pass Interference, 13 yards, enforced at LAC 25 - No Play. PENALTY on LV-27-T.Mullen, Face Mask (15 Yards), 15 yards, enforced between downs."        
[2] "(1:45) 12-T.Brady pass incomplete short right to 13-M.Evans. PENALTY on ATL-24-A.Terrell, Defensive Pass Interference, 13 yards, enforced at ATL 45 - No Play. PENALTY on ATL-24-A.Terrell, Face Mask (15 Yards), 15 yards, enforced between downs."                                             
[3] "(5:21) (Shotgun) 15-C.Streveler pass deep middle to 10-D.Hopkins to LA 10 for 40 yards (43-J.Johnson III). PENALTY on ARI-10-D.Hopkins, Offensive Pass Interference, 10 yards, enforced at 50 - No Play. PENALTY on ARI-10-D.Hopkins, Unsportsmanlike Conduct, 15 yards, enforced between downs."

So, now let's just call the third one testtext.
testtext = pbp2020$desc[grepl("PENALTY.+enforced.+PENALTY.+enforced", pbp2020$desc) & !grepl("REVERSED", pbp2020$desc)][3]

Then let's extract all of the penalties from this play!
regmatches(testtext, gregexpr("PENALTY.+?enforced", testtext))

[[1]]
[1] "PENALTY on ARI-10-D.Hopkins, Offensive Pass Interference, 10 yards, enforced"
[2] "PENALTY on ARI-10-D.Hopkins, Unsportsmanlike Conduct, 15 yards, enforced"

Now we can see that Hopkins committed two penalties on this play. If he (or someone else) committed more, they would also be extracted. Now we can handle each one like we handle all of the single penalty cases.

As long as I am right in that the NFL pbp will always say "PENALTY...enforced" for each accepted/enforced penalty, this should work. I, of course, will have to hope that a successful replay review will never coincide with a play with multiple penalties.

@JoeMarino2021
Copy link
Author

loading it from GitHub

Does RStudio know how to update GitHub loaded packages when I hit the "Update" button? If you already have the CRAN version installed of a package, will the installation go wrong? Would I have to/want to remove the current nflfastR already installed?

I was waiting for the official release since I was thinking that RStudio would do a better job of updating packages if they are CRAN releases.

I can't wait to try out nflseedR! It was always something I wanted to write for myself, but it seemed too ambitious. I am glad some professionals got around to writing something. 👍

@JoeMarino2021
Copy link
Author

So, if we want to patch the pbp database for now while the cure is on the way, we can run this little bit of code. If the database is called dfYearpbp, this code will modify the flag variable penalty to be the penalty count and penalty_yards to accurately reflect the penalty yards assessed. DO NOT run this code on the permanent repository of data you have on your PC. If you bring up a chunk into memory to work with, patch that.

idxpossdblpen = str_which(dfYearpbp$desc, "PENALTY.+enforced.+PENALTY.+enforced")
idxrevcalls = str_which(dfYearpbp$desc, "REVERSED")
idxdblpen = idxpossdblpen[!(idxpossdblpen %in% idxrevcalls)]
PenYds = rep(0, length(idxdblpen))
PenNum = rep(0, length(idxdblpen))
ctr = 0
for (idx in idxdblpen){
  ctr = ctr + 1
  rmatches = regmatches(dfYearpbp$desc[idx], gregexpr("PENALTY.+?enforced", dfYearpbp$desc[idx]))[[1]]
  for (rmatch in rmatches){
    penyards = str_extract(rmatch, ", \\d+ yards") %>% str_extract("\\d+") %>% as.numeric()
    PenYds[ctr] = PenYds[ctr] + penyards
    PenNum[ctr] = PenNum[ctr] + 1
  }
}
dfYearpbp$penalty[idxdblpen] = PenNum
dfYearpbp$penalty_yards[idxdblpen] = PenYds

This is a patch and not a fix since it works in the case when only one team committed accepted penalties on a play. This will do for now because most cases when both teams commit penalties results in offsetting penalties and no penalty yards. I have only checked 2020 so far to validate this code.

If you run the patch and then you check for penalty totals in 2020:

dfYearpbp %>% filter(week < 18, is.numeric(penalty)) %>% pull(penalty) %>% sum()
2876

We can count the yards, too.

dfYearpbp %>% filter(week < 18, is.numeric(penalty)) %>% pull(penalty_yards) %>% sum()
24914

This matches data from ESPN. Funny enough, no two sources seem to agree on penalties, so I picked ESPN to match.

If anyone can improve on this, please do. It could use some work.

@JoeMarino2021
Copy link
Author

Uh, oh! We have a double penalty from different teams that do not offset. I was hoping this wouldn't happen before NFLFastR got around to implementing separate columns for home and away team penalties.

dfYearpbp %>% filter(game_id == "2021_02_LA_IND", penalty > 0) %>% select(play_id, qtr, down, ydstogo, penalty_team, penalty, penalty_yards, desc)
# A tibble: 4 x 8
  play_id   qtr  down ydstogo penalty_team penalty penalty_yards desc                                                                                                                                                                                      
    <dbl> <dbl> <dbl>   <dbl> <chr>          <dbl>         <dbl> <chr>                                                                                                                                                                                     
1     541     1     2      18 IND                1             5 (2:39) (Shotgun) PENALTY on IND-73-J.Davenport, False Start, 5 yards, enforced at IND 17 - No Play.                                                                                       
2    2220     3     3       1 LA                 1             5 (5:20) (No Huddle) PENALTY on LA-69-S.Joseph, Neutral Zone Infraction, 5 yards, enforced at LA 8 - No Play.                                                                               
3    2243     3     1       3 LA                 2            25 (5:05) 28-J.Taylor left guard to LA 4 for -1 yards (99-A.Donald, 54-L.Floyd). PENALTY on IND-14-Z.Pascal, Taunting, 15 yards, enforced at LA 4. PENALTY on LA-41-K.Young, Disqualificatio~
4    2974     4     1      10 IND                1             5 (9:28) PENALTY on IND, Offensive Too Many Men on Field, 5 yards, enforced at IND 49 - No Play.

Because LA-41-K.Young contacted an official and got himself DQ'd, both penalties were assessed on play 2243. Thanks a lot, LA-41-K.Young. You ruined a perfectly good data set.

I sure hope NFLFastR can get around to this issue soon.

@JoeMarino2021
Copy link
Author

We had another play in Week 4 2022. The first play of the fourth quarter between Arizona and Carolina saw both teams with penalties that did not offset.

PID QTR SITUATION PLAY DESCRIPTION
2997 4 ARZ-1-10-CAR 10 (15:00) J.Conner right guard to CAR 7 for 3 yards (F.Luvu). PENALTY on CAR-F.Luvu, Unnecessary Roughness, 4 yards, enforced at CAR 7. PENALTY on ARZ-W.Hernandez, Disqualification, 15 yards, enforced between downs.

Hopefully we can address this soon in a new release of NFLFastR.

@guga31bb
Copy link
Member

Hopefully we can address this soon in a new release of NFLFastR.

We always welcome pull requests, but if this would require adding fields, it would not be implemented until the offseason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants