Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLS 2023 load_fb_match_summary() returns more results than expected #364

Open
tonyelhabr opened this issue Feb 16, 2024 · 0 comments
Open
Assignees

Comments

@tonyelhabr
Copy link
Collaborator

There seem to be some extraneous match URLs in the pre-saved MLS match summary results for 2023. This also has the downstream effect of duplicated shots returned by load_fb_match_shooting() with the same inputs. It looks like there are some teams being mapped to multiple team name/URLs, e.g. Montreal -> CF Montreal and Montreal Impact.

library(worldfootballR)
library(dplyr)

raw_2022_mls_match_summaries <- load_fb_match_summary(
  country = 'USA',
  gender = 'M',
  tier = '1st',
  season_end_year = 2022
)
#> → Data last updated 2023-12-10 18:17:05 UTC

## good 
expect_equal(
  raw_2022_mls_match_summaries |> 
    filter(Matchweek == 'Major League Soccer (Regular Season)') |> 
    distinct(MatchURL) |> 
    nrow(),
  ## 28 teams x 34 games each
  28 * 34 / 2
)

raw_2023_mls_match_summaries <- load_fb_match_summary(
  country = 'USA',
  gender = 'M',
  tier = '1st',
  season_end_year = 2023
)
#> → Data last updated 2023-12-10 18:17:05 UTC

## more than expected 
expect_equal(
  raw_2023_mls_match_summaries |> 
    filter(Matchweek == 'Major League Soccer (Regular Season)') |> 
    distinct(MatchURL) |> 
    nrow(),
  ## 29 teams x 34 games each
  29 * 34 / 2
)
#> Error: nrow(...) not equal to 29 * 34/2.
#> 1/1 mismatches
#> [1] 621 - 493 == 128

## CF-Montreal <-> Montreal-Impact
## Sporting-KC <-> Kansas-City-Wiz
## etc.
raw_2023_mls_match_summaries |> 
  filter(Matchweek == 'Major League Soccer (Regular Season)') |> 
  distinct(MatchURL) |> 
  mutate(match_id = basename(dirname(MatchURL)), .before = 1) |> 
  group_by(match_id) |> 
  filter(n() > 1L) |> 
  ungroup() |> 
  arrange(match_id)
#> # A tibble: 256 × 2
#>    match_id MatchURL                                                            
#>    <chr>    <chr>                                                               
#>  1 1791335f https://fbref.com/en/matches/1791335f/CF-Montreal-DC-United-April-1…
#>  2 1791335f https://fbref.com/en/matches/1791335f/Montreal-Impact-DC-United-Apr…
#>  3 18f0f8a3 https://fbref.com/en/matches/18f0f8a3/LA-Galaxy-Sporting-KC-June-21…
#>  4 18f0f8a3 https://fbref.com/en/matches/18f0f8a3/LA-Galaxy-Kansas-City-Wiz-Jun…
#>  5 1a5d42d5 https://fbref.com/en/matches/1a5d42d5/New-York-Red-Bulls-Charlotte-…
#>  6 1a5d42d5 https://fbref.com/en/matches/1a5d42d5/NYNJ-MetroStars-Charlotte-FC-…
#>  7 1b9f247a https://fbref.com/en/matches/1b9f247a/New-York-Red-Bulls-Columbus-C…
#>  8 1b9f247a https://fbref.com/en/matches/1b9f247a/NYNJ-MetroStars-Columbus-Crew…
#>  9 1c7e40b3 https://fbref.com/en/matches/1c7e40b3/New-York-Red-Bulls-Inter-Miam…
#> 10 1c7e40b3 https://fbref.com/en/matches/1c7e40b3/NYNJ-MetroStars-Inter-Miami-A…
#> # ℹ 246 more rows
@tonyelhabr tonyelhabr self-assigned this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant