Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support search rate limit #553

Closed
nwalsh1995 opened this issue Apr 9, 2017 · 13 comments
Closed

Support search rate limit #553

nwalsh1995 opened this issue Apr 9, 2017 · 13 comments

Comments

@nwalsh1995
Copy link

It seems the get_rate_limit function will return what Github considers the 'core' rate limit. However, there are different rate limits for searching code. See here.

Right now there isn't a way to get the search code rate limits as far as I can tell.

@justfortherec
Copy link

I see the same issue. Here is a small script that exemplifies the problem.

import os
from datetime import datetime
from github import Github

# Login
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
github = Github(TOKEN)

# Get initial rate limit and reset time
rl1 = github.get_rate_limit().rate
print("RL1 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl1.limit, rl1.remaining, rl1.reset))
# RL1 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# Perform a search
results = github.search_code("Hello World")

# Rate limit of Github instance is unchanged after a search
rl2 = github.get_rate_limit().rate
print("RL2 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl2.limit, rl2.remaining, rl2.reset))
# RL2 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# The PaginatedList instance has a Requestor with the same info
rl3 = results._PaginatedList__requester.rate_limiting
rl3_reset = datetime.utcfromtimestamp(int(
        results._PaginatedList__requester.rate_limiting_resettime))
print("RL3 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl3[0], rl3[1], rl3_reset))
# RL3 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# However, the actual ContentFile results show a different limit
# The Requester of each individual result ...
result = results[0]
rl4 = result._requester.rate_limiting
rl4_reset = datetime.utcfromtimestamp(int(
        result._requester.rate_limiting_resettime))
print("RL4 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl4[1], rl4[0], rl4_reset))
# RL4 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# ... and headers stored in the content file directly show a different rate limit.
rl5_limit = result._headers['x-ratelimit-limit']
rl5_remaining = result._headers['x-ratelimit-remaining']
rl5_reset = datetime.utcfromtimestamp(int(
        result._headers['x-ratelimit-reset']))
print("RL5 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl5_limit, rl5_remaining, rl5_reset))
# RL5 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# In the end, the main Github instance still shows the original full rate limit
rl6 = github.get_rate_limit().rate
print("RL6 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl6.limit, rl6.remaining, rl6.reset))
# RL6 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

@brentshermana
Copy link

+1 This feature is necessary for an application I'm trying to build

@justfortherec
Copy link

@brentshermana for you application, consider inspecting rate limit headers (of last response; see in my example above) or polling the /rate_limit endpoint yourself. That contains information about all kind of rate limits and does not count towards any rate limit.

Eventually, it would be nice if PyGithub would nor only parse rate but also parse resources from what /rate_limit returns. The information is there, it is not made available to consumers of the library unfortunately.

Also, the paginated list should return the rate limit for code search if it returns results of such a search, i.e. whatever is stored in _headers['x-ratelimit-*'].

@justfortherec
Copy link

btw: I just noticed, the field rate from JSON returned by /rate_limit is deprecated and information in resources is the recommended alternative: https://developer.github.com/v3/rate_limit/#deprecation-notice

@brentshermana
Copy link

brentshermana commented Oct 8, 2017

I'm doing exactly that. If anyone wants to adapt this and try and make a pull request, you have my blessing:

def wait(seconds):
    print("Waiting for {} seconds ...".format(seconds))
    time.sleep(seconds)
    print("Done waiting - resume!")

def api_wait():
    url = 'https://api.github.com/rate_limit'
    response = urlopen(url).read()
    data = json.loads(response.decode())
    if data['resources']['core']['remaining'] <= 10:  # extra margin of safety
        reset_time = data['resources']['core']['reset']
        wait(reset_time - time.time() + 10)
    elif data['resources']['search']['remaining'] <= 2:
        reset_time = data['resources']['search']['reset']
        wait(reset_time - time.time() + 10)

@BBI-YggyKing
Copy link
Contributor

BBI-YggyKing commented Jun 20, 2018

I'm experiencing a problem where my iteration over the results from search_issues stops after 1020 results when there should be 1869 results. My script stops at the same point every time. Could this be a rate-limiting issue?

I do not get an error, the results just run out. If I put my query string directly into the GitHub web interface then I see all 1869 results, as expected. 1020 is a multiple of 30, which makes me wonder if it's pagination problem?

Code is as follows:

querystring = "type:pr is:closed repo:xxxx closed:2017-07-01..2018-06-30"
issues = git.search_issues(query=querystring, sort="updated", order="asc")
for issue in issues:
    pull = issue.as_pull_request()
    print "%s: %s" % (pull.number, pull.title)

Many thanks for any tips you can share as to what might be going wrong here.

@BBI-YggyKing
Copy link
Contributor

I also tried iterating through issues.reversed to see if it would start at the end of my expected 1869 results. However in this case I only get 30 issues, from the first page of results.

@BBI-YggyKing
Copy link
Contributor

On further investigation, it appears that I'm running into the 1000 results per search limit.

@sfdye
Copy link
Member

sfdye commented Jun 22, 2018

What about we provide one more method get_search_rate_limit() for the search rate limit while the existing get_rate_limit() will parse the latest "core" rate limit suggested by Github: https://developer.github.com/v3/rate_limit/

@sfdye sfdye changed the title Search ratelimiting Support search rate limit Jun 22, 2018
@sfdye sfdye closed this as completed in fd8a036 Sep 5, 2018
@sfdye
Copy link
Member

sfdye commented Sep 5, 2018

Search API rate limit and GraphQL rate limit is available now. One method for all.

By default it will show you the "core" rate limit. You can also get search/graphql rate limit by accessing the respective attributes.

r = g.get_rate_limit()
>>> r
RateLimit(core=Rate(remaining=4923, limit=5000))
>>> r.search
Rate(remaining=30, limit=30)
>>> r.graphql
Rate(remaining=5000, limit=5000)

@BBI-YggyKing
Copy link
Contributor

Looks great, thanks @sfdye!

To emulate @brentshermana's waiting function to avoid problems with search rate limiting, you can now do something like this:

from datetime import datetime

def api_wait_search(git):
  limits = git.get_rate_limit()
  if limits.search.remaining <= 2:
    seconds = (limits.search.reset - datetime.now()).total_seconds()
    print "Waiting for %d seconds ..." % (seconds)
    time.sleep(seconds)
    print "Done waiting - resume!"

Note that calling get_rate_limit() will introduce a small delay, so you may want to minimize how often you call this.

@pokey
Copy link

pokey commented Oct 25, 2019

For people that land here from search engine, I modified @bbi-yggy's function a bit:

from datetime import datetime, timezone

def rate_limited_retry(github):
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = github.get_rate_limit()
                    reset = limits.search.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds()
                    print(f"Rate limit exceeded")
                    print(f"Reset is in {seconds:.3g} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds:.3g} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

This function can be used as follows:

@rate_limited_retry(github)
def run_query(import_string):
    query_string = f"language:Python \"{import_string}\""
    return list(github.search_code(query_string))

results = run_query(import_string)

candrikos pushed a commit to candrikos/PyGithub that referenced this issue Sep 25, 2020
@mendhak
Copy link

mendhak commented Apr 18, 2021

Modified version of pokey's decorator above to take into account core/search/graphql.
Also added a 30 second delay because Github doesn't reset the rate limit exactly at the time it says.

def rate_limited_retry():
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = gh.get_rate_limit()
                    print(f"Rate limit exceeded")
                    print("Search:", limits.search, "Core:", limits.core, "GraphQl:", limits.graphql)

                    if limits.search.remaining == 0:
                        limited = limits.search
                    elif limits.graphql.remaining == 0:
                        limited = limits.graphql
                    else:
                        limited = limits.core
                    reset = limited.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds() + 30
                    print(f"Reset is in {seconds} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants