Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Rate limit option for free API users #945

Open
3 tasks done
ymgenesis opened this issue Feb 19, 2024 · 5 comments
Open
3 tasks done

[FEATURE] Rate limit option for free API users #945

ymgenesis opened this issue Feb 19, 2024 · 5 comments

Comments

@ymgenesis
Copy link

ymgenesis commented Feb 19, 2024

  • I am requesting a feature.
  • I am running the development branch
  • I have read the Opening an issue

Description

An option that will automatically limit bdfr's API requests to a number per minute to work within Reddit's newer free API rate limits of (I think) less than 100 per minute, or averaged over 10 minutes for burst usage (most common example I've seen on reddit is 60 req/min and/or not more than 600 req for 10 min).

Currently I'm doing a sleep of a few minutes or so in-between bdfr commands. However it would be nice to be able to have functionality in bdfr that would maximize its usage while being within the free API limits by tempering its own requests. I understand PRAW handles this? Instead of an execution failing because of prawcore.exceptions.TooManyRequests: received 429 HTTP response, could bdfr analyze the x-ratelimit-remaining, x-ratelimit-reset, x-ratelimit-used responses to wait until the program can proceed without failing?

As far as I can tell the logs don't list the API requests, so it's hard to tell how many are used when executing. I'm also not sure how to check the response headers (x-ratelimit-remaining, x-ratelimit-reset, x-ratelimit-used) when running bdfr for testing or to set my sleep to the time until the ratelimit is reset.

@Serene-Arc
Copy link
Collaborator

This is already done by the package we use to interface with the Reddit API, praw, at least according to their documentation. We might need to boost the version installed though.

@ymgenesis
Copy link
Author

ymgenesis commented Feb 20, 2024

@Serene-Arc I thought it was, as well. I’m consistently getting 429 responses and the execution fails with an exception. Granted I am doing a lot of calls, but I suppose I expected praw to manage itself instead of failing. I’ll try updating praw. I even set my sleep to 10 minutes between bdfr executions which download 20 submissions each time. It’s better but it still fails often. It’s too hard to guess at the time until rate reset manually. 

EDIT: I updated praw with "praw>=7.7.1", in pyproject.toml, but it still refuses to download after too many requests (obvious enough):

[2024-02-20 09:48:46,662 - bdfr.connector - ERROR] - User god failed to be retrieved due to a PRAW exception: received 429 HTTP response
[2024-02-20 09:48:46,666 - bdfr.connector - DEBUG] - Waiting 60 seconds to continue

Additionally, I still get the exception crash with the updated praw instead of a "clean" 429 like above (not sure if this is related to me using an older approach to the progress bar). Unlike the other 429, the user is retrieved, but I guess the posts hit a 429?

[2024-02-20 10:02:35,014 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2024-02-20 10:02:35,020 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2024-02-20 10:02:35,827 - bdfr.connector - DEBUG] - Retrieving submitted posts of user god
[2024-02-20 10:02:39,599 - bdfr.downloader - INFO] - Calculating hashes for 38 files
Traceback (most recent call last):
  File "/Users/me/Documents/GitHub/bulk-downloader-for-reddit/bdfr/downloader.py", line 51, in download
    for submission in tqdm(list(generator), desc=desc, unit="post", leave=False):
                           ^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/models/listing/generator.py", line 63, in __next__
    self._next_batch()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/models/listing/generator.py", line 89, in _next_batch
    self._listing = self._reddit.get(self.url, params=self.params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/reddit.py", line 712, in get
    return self._objectify_request(method="GET", params=params, path=path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/reddit.py", line 517, in _objectify_request
    self.request(
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/reddit.py", line 941, in request
    return self._core.request(
           ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/prawcore/sessions.py", line 330, in request
    return self._request_with_retries(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/prawcore/sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.TooManyRequests: received 429 HTTP response

My thought was if a 429 is received, the x-ratelimit-reset header is parsed (or one of the other rate limit headers) and a sleep is put until the reset is reached. The 60 to continue doesn't retry the previous attempt.

@ymgenesis
Copy link
Author

ymgenesis commented Feb 20, 2024

On second thought, it may have been the way I was using it.

I authenticated with a different user than the account the app client/secret was created on because I didn't want to verify my email on one account. I created the app on the account I'm authenticated on and it seems to be going fine so far. Praw is still updated to 7.7.1.

@Serene-Arc
Copy link
Collaborator

I might still force an update on the package requirements since it's been a while but I'm glad it's working. Reddit has been very difficult to work with since the API changes.

@ymgenesis
Copy link
Author

ymgenesis commented Feb 21, 2024

Makes sense. I did a rather long run yesterday of some thousand files over some hours and I didn't get any 429 responses, so that's a plus!

I also did some simple rate tests using the latest praw. From what I see it tries to keep the x-ratelimit-remaining response value on par with the x-ratelimit-reset value (converted from unix epoch time to datetime in seconds). Theoretically the requests can never drop to 0 before the reset timer resets, except it does sometimes as the matching isn't perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants