Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPError: HTTP Error 429: Too Many Requests #756

Open
seanofthedead86 opened this issue Nov 4, 2022 · 21 comments
Open

HTTPError: HTTP Error 429: Too Many Requests #756

seanofthedead86 opened this issue Nov 4, 2022 · 21 comments

Comments

@seanofthedead86
Copy link

seanofthedead86 commented Nov 4, 2022

I'm getting a "HTTPError: HTTP Error 429: Too Many Requests" notification when running anything NFL from the API. I'm assuming this is being caused by sports-reference blocking me from making any requests on their page but wanted to see if anyone has had this issue before and if their is a way to resolve it.

Here is the error that's thrown:

HTTPError Traceback (most recent call last)
in
1 Team_1 = 'ATL'
2 Team_2 = 'SDG'
----> 3 Total_1 = model.predict(model_input(Team_1, Team_2))
4 # Total_2 = model.predict(model_input(Team_2, Team_1))
5 # Total_1[0], Total_2[0], Total_1[0] - Total_2[0]

6 frames
in model_input(home, away)
2 home_2018_schedule = team_schedule(home, 2018)
3 home_2019_schedule = team_schedule(home, 2019)
----> 4 home_2020_schedule = team_schedule(home, 2020)
5 home_2021_schedule = team_schedule(home, 2021)
6 home_2022_schedule = team_schedule(home, 2022)

in team_schedule(team, year)
1 def team_schedule (team, year):
----> 2 schedule = Schedule(team, year)
3 return schedule.dataframe.dropna()
4
5 def team_info (team):

/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in init(self, abbreviation, year)
578 def init(self, abbreviation, year=None):
579 self._games = []
--> 580 self._pull_schedule(abbreviation, year)
581
582 def getitem(self, index):

/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in _pull_schedule(self, abbreviation, year)
704 str(int(year) - 1))):
705 year = str(int(year) - 1)
--> 706 doc = pq(SCHEDULE_URL % (abbreviation.lower(), year))
707 schedule = utils._get_stats_table(doc, 'table#gamelog%s' % year)
708 if not schedule:

/usr/local/lib/python3.7/dist-packages/pyquery/pyquery.py in init(self, *args, **kwargs)
183 html = opener(url, **kwargs)
184 else:
--> 185 html = url_opener(url, kwargs)
186 if not self.parser:
187 self.parser = 'html'

/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in url_opener(url, kwargs)
74 def url_opener(url, kwargs):
75 if HAS_REQUEST:
---> 76 return _requests(url, kwargs)
77 return _urllib(url, kwargs)

/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in _requests(url, kwargs)
59 if not (200 <= resp.status_code < 300):
60 raise HTTPError(resp.url, resp.status_code,
---> 61 resp.reason, resp.headers, None)
62 if encoding:
63 resp.encoding = encoding

HTTPError: HTTP Error 429: Too Many Requests

@mattpfreer
Copy link

I'm having the same issue with NHL and NFL, but NBA was working for me

@seanofthedead86
Copy link
Author

I'm having the same issue with NHL and NFL, but NBA was working for me

Yeah Im using NFL. I tried NBA and it seemed to work fine.

@mattpfreer
Copy link

Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine

https://www.sports-reference.com/bot-traffic.html

@seanofthedead86
Copy link
Author

Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine

https://www.sports-reference.com/bot-traffic.html

Same here but that's probably what's happening. Pretty much renders sportsipy useless.

@mattpfreer
Copy link

Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well

@mattpfreer
Copy link

@roclark let us know if you have any ideas as well

@seanofthedead86
Copy link
Author

@mattpfreer adding the timer worked!

@mattpfreer
Copy link

awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request

@seanofthedead86
Copy link
Author

awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request

I added it within a function I defined that calls Schedule and Team info several times for previous years and the current year for two separate teams. I think going forward I'm going to just save them as a csv file and just update the current year as needed.

@mattpfreer
Copy link

got it, thank you! hopefully there will be a resolution in the future but makes sense.

@jrclegg2
Copy link

jrclegg2 commented Nov 7, 2022

Yeah this is a problem rendering the API useless for my use case in NCAAB. I can't even run a Teams() call.

Any recommendations? I see the time.sleep option, but wow this will take some time to run for 300+ teams.

@seanofthedead86
Copy link
Author

Everything seemed to be working well the past few days, especially after adding in the timer in a few spots, but today I called Teams() and got the error. I hadn't run anything else.

@wittwg
Copy link

wittwg commented Nov 11, 2022

The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.

I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?

@seanofthedead86
Copy link
Author

The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.

I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?

Not sure. I experimented with NHL and CBB a little bit and they worked but at this point if the Teams function for NFL isn't going to work it's pretty much rendered the API useless for me.

@mattpfreer is it working for you?

@mattpfreer
Copy link

I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece

@CorgPredicts
Copy link

I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece

Would you be able to share what lines you commented out to get it to work? Thanks

@bveber
Copy link

bveber commented Nov 17, 2022

The problem with Teams is it quickly fires off as many requests as there are teams, which almost immediately violates the new request limit at sports reference. I fixed it on my fork. You can see the change here. I just created a new utility function that adds a time.sleep after each url request via pyquery.

As long as you don't run multiple sportsipy commands in parallel this fix should guarantee you won't exceed the new limit. Obviously it's not ideal for NCAAB since fetching 300 teams will take ~15 minutes, but at least it works.

@jrclegg2
Copy link

I mean I'm also having an issue to boxscores, which is more concerning.

My teams fix is a time.sleep(13) in _retrieve_all_teams, around line 1130.

I'm trying to train a model with boxscores back to '07. Querying each game individually with a 13sec sleep would take ~9 days...

Does anyone here happen to have a cache / boxscore info that goes back from last season to some year? Any help is much appreciated!!

@jrclegg2
Copy link

Does anybody have a good solution for boxscore.py?

@bveber
Copy link

bveber commented Nov 21, 2022

Does anybody have a good solution for boxscore.py?

The best way I've found to fix the it is to intentionally limit the rate of all calls to the website like I mentioned in my post above (check out this PR for specific details). I don't think your use case of bulk downloading data for several years at once is really an option anymore unless you're willing to wait a really long time for the results. You'll be better off scheduling a long running job and caching all the historical data you intend to re-use.

@kankshat
Copy link

kankshat commented Dec 9, 2022

Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well

Just ran into this problem as well. I created a model to predict the winning teams each week in the nfl and wanted to update my ultimate CSV and ran into this issue. The timesleep worked perfectly for me. I wish sports reference would just create an API directly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants