Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FBRef] read_player_season_stats includes Women's World Cup by default (season 2023) #576

Closed
mvantschip opened this issue May 12, 2024 · 2 comments · Fixed by #595
Closed
Labels
documentation Improvements or additions to documentation

Comments

@mvantschip
Copy link

I am fetching player data for the 2023 season, which by default, according to the docs, should only return data from the top 5 leagues. However, I noticed that stats from the Women's World Cup are included as well.
I can reproduce this issue with the following code:

import soccerdata as sd
import pandas as pd

fbref = sd.FBref(seasons=2023)
stats = fbref.read_player_season_stats(stat_type='standard')
print(stats.index.unique(level='league'))

Output:

Index(['ENG-Premier League', 'ESP-La Liga', 'FRA-Ligue 1', 'GER-Bundesliga',
       'INT-Women's World Cup', 'ITA-Serie A'],
      dtype='object', name='league')`

In addition, I get a dataframe where each row occurs twice, but I am not sure if that problem is related.
See, from the same code, the output of stats.head():

import soccerdata as sd
import pandas as pd

fbref = sd.FBref(seasons=2023)
stats = fbref.read_player_season_stats(stat_type='standard')
print(stats.head())

Output:

                                                 nation pos     age  born Playing Time                    Performance                                 Expected                      Progression           Per 90 Minutes
                                                                                    MP Starts   Min   90s         Gls Ast G+A G-PK PK PKatt CrdY CrdR       xG  npxG   xAG npxG+xAG        PrgC PrgP PrgR            Gls   Ast   G+A  G-PK G+A-PK    xG   xAG xG+xAG  npxG npxG+xAG
league             season team    player
ENG-Premier League 2324   Arsenal Aaron Ramsdale    ENG  GK  25-364  1998            6      6   540   6.0           0   0   0    0  0     0    0    0      0.0   0.0   0.0      0.0           0    2    0            0.0   0.0   0.0   0.0    0.0   0.0   0.0    0.0   0.0      0.0
                                  Aaron Ramsdale    ENG  GK  25-364  1998            6      6   540   6.0           0   0   0    0  0     0    0    0      0.0   0.0   0.0      0.0           0    2    0            0.0   0.0   0.0   0.0    0.0   0.0   0.0    0.0   0.0      0.0
                                  Ben White         ENG  DF  26-217  1997           35     33  2830  31.4           4   4   8    4  0     0    8    0      1.1   1.1   3.5      4.6          41  175  153           0.13  0.13  0.25  0.13   0.25  0.04  0.11   0.15  0.04     0.15
                                  Ben White         ENG  DF  26-217  1997           35     33  2830  31.4           4   4   8    4  0     0    8    0      1.1   1.1   3.5      4.6          41  175  153           0.13  0.13  0.25  0.13   0.25  0.04  0.11   0.15  0.04     0.15
                                  Bukayo Saka       ENG  FW  22-250  2001           34     34  2838  31.5          16   9  25   10  6     6    3    0     15.1  10.4  10.2     20.6         153  122  502           0.51  0.29  0.79  0.32    0.6  0.48  0.32    0.8  0.33     0.65

Thanks for the wonderful work!

@probberechts
Copy link
Owner

The docs are outdated. When no leagues are given, it returns the data for all the supported leagues. Previously, only the Big 5 leagues were supported but I've added support for the World Cups and Euros since.

@probberechts probberechts added the documentation Improvements or additions to documentation label May 13, 2024
@mvantschip
Copy link
Author

I see! Thanks. Any idea about the duplicate rows? Or should I make a separate issue for that?

probberechts added a commit that referenced this issue May 27, 2024
When the FBref reader was initialized with leagues=["Big 5 European Leagues
Combined", ...] where ... contains other leagues from the Big 5 (e.g.,
"ENG-Premier League") it would scrape the league twice.

Fixes #576
probberechts added a commit that referenced this issue May 27, 2024
When the FBref reader was initialized with leagues=["Big 5 European Leagues
Combined", ...] where ... contains other leagues from the Big 5 (e.g.,
"ENG-Premier League") it would scrape the league twice.

Fixes #576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants