Support search rate limit #553

nwalsh1995 · 2017-04-09T23:05:00Z

It seems the get_rate_limit function will return what Github considers the 'core' rate limit. However, there are different rate limits for searching code. See here.

Right now there isn't a way to get the search code rate limits as far as I can tell.

justfortherec · 2017-09-22T16:32:49Z

I see the same issue. Here is a small script that exemplifies the problem.

import os
from datetime import datetime
from github import Github

# Login
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
github = Github(TOKEN)

# Get initial rate limit and reset time
rl1 = github.get_rate_limit().rate
print("RL1 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl1.limit, rl1.remaining, rl1.reset))
# RL1 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# Perform a search
results = github.search_code("Hello World")

# Rate limit of Github instance is unchanged after a search
rl2 = github.get_rate_limit().rate
print("RL2 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl2.limit, rl2.remaining, rl2.reset))
# RL2 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# The PaginatedList instance has a Requestor with the same info
rl3 = results._PaginatedList__requester.rate_limiting
rl3_reset = datetime.utcfromtimestamp(int(
        results._PaginatedList__requester.rate_limiting_resettime))
print("RL3 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl3[0], rl3[1], rl3_reset))
# RL3 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# However, the actual ContentFile results show a different limit
# The Requester of each individual result ...
result = results[0]
rl4 = result._requester.rate_limiting
rl4_reset = datetime.utcfromtimestamp(int(
        result._requester.rate_limiting_resettime))
print("RL4 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl4[1], rl4[0], rl4_reset))
# RL4 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# ... and headers stored in the content file directly show a different rate limit.
rl5_limit = result._headers['x-ratelimit-limit']
rl5_remaining = result._headers['x-ratelimit-remaining']
rl5_reset = datetime.utcfromtimestamp(int(
        result._headers['x-ratelimit-reset']))
print("RL5 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl5_limit, rl5_remaining, rl5_reset))
# RL5 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# In the end, the main Github instance still shows the original full rate limit
rl6 = github.get_rate_limit().rate
print("RL6 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl6.limit, rl6.remaining, rl6.reset))
# RL6 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

brentshermana · 2017-10-08T00:46:48Z

+1 This feature is necessary for an application I'm trying to build

justfortherec · 2017-10-08T10:18:49Z

@brentshermana for you application, consider inspecting rate limit headers (of last response; see in my example above) or polling the /rate_limit endpoint yourself. That contains information about all kind of rate limits and does not count towards any rate limit.

Eventually, it would be nice if PyGithub would nor only parse rate but also parse resources from what /rate_limit returns. The information is there, it is not made available to consumers of the library unfortunately.

Also, the paginated list should return the rate limit for code search if it returns results of such a search, i.e. whatever is stored in _headers['x-ratelimit-*'].

justfortherec · 2017-10-08T10:27:05Z

btw: I just noticed, the field rate from JSON returned by /rate_limit is deprecated and information in resources is the recommended alternative: https://developer.github.com/v3/rate_limit/#deprecation-notice

brentshermana · 2017-10-08T19:08:48Z

I'm doing exactly that. If anyone wants to adapt this and try and make a pull request, you have my blessing:

def wait(seconds):
    print("Waiting for {} seconds ...".format(seconds))
    time.sleep(seconds)
    print("Done waiting - resume!")

def api_wait():
    url = 'https://api.github.com/rate_limit'
    response = urlopen(url).read()
    data = json.loads(response.decode())
    if data['resources']['core']['remaining'] <= 10:  # extra margin of safety
        reset_time = data['resources']['core']['reset']
        wait(reset_time - time.time() + 10)
    elif data['resources']['search']['remaining'] <= 2:
        reset_time = data['resources']['search']['reset']
        wait(reset_time - time.time() + 10)

BBI-YggyKing · 2018-06-20T18:12:03Z

I'm experiencing a problem where my iteration over the results from search_issues stops after 1020 results when there should be 1869 results. My script stops at the same point every time. Could this be a rate-limiting issue?

I do not get an error, the results just run out. If I put my query string directly into the GitHub web interface then I see all 1869 results, as expected. 1020 is a multiple of 30, which makes me wonder if it's pagination problem?

Code is as follows:

querystring = "type:pr is:closed repo:xxxx closed:2017-07-01..2018-06-30"
issues = git.search_issues(query=querystring, sort="updated", order="asc")
for issue in issues:
    pull = issue.as_pull_request()
    print "%s: %s" % (pull.number, pull.title)

Many thanks for any tips you can share as to what might be going wrong here.

BBI-YggyKing · 2018-06-20T18:31:57Z

I also tried iterating through issues.reversed to see if it would start at the end of my expected 1869 results. However in this case I only get 30 issues, from the first page of results.

BBI-YggyKing · 2018-06-20T23:25:00Z

On further investigation, it appears that I'm running into the 1000 results per search limit.

sfdye · 2018-06-22T02:51:00Z

What about we provide one more method get_search_rate_limit() for the search rate limit while the existing get_rate_limit() will parse the latest "core" rate limit suggested by Github: https://developer.github.com/v3/rate_limit/

sfdye · 2018-09-05T04:36:30Z

Search API rate limit and GraphQL rate limit is available now. One method for all.

By default it will show you the "core" rate limit. You can also get search/graphql rate limit by accessing the respective attributes.

r = g.get_rate_limit()
>>> r
RateLimit(core=Rate(remaining=4923, limit=5000))
>>> r.search
Rate(remaining=30, limit=30)
>>> r.graphql
Rate(remaining=5000, limit=5000)

BBI-YggyKing · 2019-06-12T17:55:29Z

Looks great, thanks @sfdye!

To emulate @brentshermana's waiting function to avoid problems with search rate limiting, you can now do something like this:

from datetime import datetime

def api_wait_search(git):
  limits = git.get_rate_limit()
  if limits.search.remaining <= 2:
    seconds = (limits.search.reset - datetime.now()).total_seconds()
    print "Waiting for %d seconds ..." % (seconds)
    time.sleep(seconds)
    print "Done waiting - resume!"

Note that calling get_rate_limit() will introduce a small delay, so you may want to minimize how often you call this.

pokey · 2019-10-25T14:31:43Z

For people that land here from search engine, I modified @bbi-yggy's function a bit:

from datetime import datetime, timezone

def rate_limited_retry(github):
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = github.get_rate_limit()
                    reset = limits.search.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds()
                    print(f"Rate limit exceeded")
                    print(f"Reset is in {seconds:.3g} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds:.3g} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

This function can be used as follows:

@rate_limited_retry(github)
def run_query(import_string):
    query_string = f"language:Python \"{import_string}\""
    return list(github.search_code(query_string))

results = run_query(import_string)

mendhak · 2021-04-18T12:11:45Z

Modified version of pokey's decorator above to take into account core/search/graphql.
Also added a 30 second delay because Github doesn't reset the rate limit exactly at the time it says.

def rate_limited_retry():
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = gh.get_rate_limit()
                    print(f"Rate limit exceeded")
                    print("Search:", limits.search, "Core:", limits.core, "GraphQl:", limits.graphql)

                    if limits.search.remaining == 0:
                        limited = limits.search
                    elif limits.graphql.remaining == 0:
                        limited = limits.graphql
                    else:
                        limited = limits.core
                    reset = limited.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds() + 30
                    print(f"Reset is in {seconds} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

sfdye added the feature request label Jun 22, 2018

sfdye changed the title ~~Search ratelimiting~~ Support search rate limit Jun 22, 2018

sfdye mentioned this issue Sep 4, 2018

Can I get the limit for search now? #895

Closed

sfdye closed this as completed in fd8a036 Sep 5, 2018

candrikos pushed a commit to candrikos/PyGithub that referenced this issue Sep 25, 2020

Adding support for search/graphql rate limit, fixes PyGithub#553

55359b2

mendhak mentioned this issue Apr 19, 2021

Handle rate limits with pygithub using decorator clockfort/GitHub-Backup#44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support search rate limit #553

Support search rate limit #553

nwalsh1995 commented Apr 9, 2017

justfortherec commented Sep 22, 2017

brentshermana commented Oct 8, 2017

justfortherec commented Oct 8, 2017

justfortherec commented Oct 8, 2017

brentshermana commented Oct 8, 2017 •

edited

BBI-YggyKing commented Jun 20, 2018 •

edited

BBI-YggyKing commented Jun 20, 2018

BBI-YggyKing commented Jun 20, 2018

sfdye commented Jun 22, 2018 •

edited

sfdye commented Sep 5, 2018

BBI-YggyKing commented Jun 12, 2019

pokey commented Oct 25, 2019

mendhak commented Apr 18, 2021

Support search rate limit #553

Support search rate limit #553

Comments

nwalsh1995 commented Apr 9, 2017

justfortherec commented Sep 22, 2017

brentshermana commented Oct 8, 2017

justfortherec commented Oct 8, 2017

justfortherec commented Oct 8, 2017

brentshermana commented Oct 8, 2017 • edited

BBI-YggyKing commented Jun 20, 2018 • edited

BBI-YggyKing commented Jun 20, 2018

BBI-YggyKing commented Jun 20, 2018

sfdye commented Jun 22, 2018 • edited

sfdye commented Sep 5, 2018

BBI-YggyKing commented Jun 12, 2019

pokey commented Oct 25, 2019

mendhak commented Apr 18, 2021

brentshermana commented Oct 8, 2017 •

edited

BBI-YggyKing commented Jun 20, 2018 •

edited

sfdye commented Jun 22, 2018 •

edited