New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support search rate limit #553
Comments
I see the same issue. Here is a small script that exemplifies the problem. import os
from datetime import datetime
from github import Github
# Login
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
github = Github(TOKEN)
# Get initial rate limit and reset time
rl1 = github.get_rate_limit().rate
print("RL1 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl1.limit, rl1.remaining, rl1.reset))
# RL1 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# Perform a search
results = github.search_code("Hello World")
# Rate limit of Github instance is unchanged after a search
rl2 = github.get_rate_limit().rate
print("RL2 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl2.limit, rl2.remaining, rl2.reset))
# RL2 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# The PaginatedList instance has a Requestor with the same info
rl3 = results._PaginatedList__requester.rate_limiting
rl3_reset = datetime.utcfromtimestamp(int(
results._PaginatedList__requester.rate_limiting_resettime))
print("RL3 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl3[0], rl3[1], rl3_reset))
# RL3 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# However, the actual ContentFile results show a different limit
# The Requester of each individual result ...
result = results[0]
rl4 = result._requester.rate_limiting
rl4_reset = datetime.utcfromtimestamp(int(
result._requester.rate_limiting_resettime))
print("RL4 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl4[1], rl4[0], rl4_reset))
# RL4 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.
# ... and headers stored in the content file directly show a different rate limit.
rl5_limit = result._headers['x-ratelimit-limit']
rl5_remaining = result._headers['x-ratelimit-remaining']
rl5_reset = datetime.utcfromtimestamp(int(
result._headers['x-ratelimit-reset']))
print("RL5 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl5_limit, rl5_remaining, rl5_reset))
# RL5 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.
# In the end, the main Github instance still shows the original full rate limit
rl6 = github.get_rate_limit().rate
print("RL6 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl6.limit, rl6.remaining, rl6.reset))
# RL6 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35. |
+1 This feature is necessary for an application I'm trying to build |
@brentshermana for you application, consider inspecting rate limit headers (of last response; see in my example above) or polling the Eventually, it would be nice if PyGithub would nor only parse Also, the paginated list should return the rate limit for code search if it returns results of such a search, i.e. whatever is stored in |
btw: I just noticed, the field |
I'm doing exactly that. If anyone wants to adapt this and try and make a pull request, you have my blessing:
|
I'm experiencing a problem where my iteration over the results from search_issues stops after 1020 results when there should be 1869 results. My script stops at the same point every time. Could this be a rate-limiting issue? I do not get an error, the results just run out. If I put my query string directly into the GitHub web interface then I see all 1869 results, as expected. 1020 is a multiple of 30, which makes me wonder if it's pagination problem? Code is as follows:
Many thanks for any tips you can share as to what might be going wrong here. |
I also tried iterating through |
On further investigation, it appears that I'm running into the 1000 results per search limit. |
What about we provide one more method |
Search API rate limit and GraphQL rate limit is available now. One method for all. By default it will show you the "core" rate limit. You can also get search/graphql rate limit by accessing the respective attributes. r = g.get_rate_limit()
>>> r
RateLimit(core=Rate(remaining=4923, limit=5000))
>>> r.search
Rate(remaining=30, limit=30)
>>> r.graphql
Rate(remaining=5000, limit=5000) |
Looks great, thanks @sfdye! To emulate @brentshermana's waiting function to avoid problems with search rate limiting, you can now do something like this:
Note that calling |
For people that land here from search engine, I modified @bbi-yggy's function a bit: from datetime import datetime, timezone
def rate_limited_retry(github):
def decorator(func):
def ret(*args, **kwargs):
for _ in range(3):
try:
return func(*args, **kwargs)
except RateLimitExceededException:
limits = github.get_rate_limit()
reset = limits.search.reset.replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
seconds = (reset - now).total_seconds()
print(f"Rate limit exceeded")
print(f"Reset is in {seconds:.3g} seconds.")
if seconds > 0.0:
print(f"Waiting for {seconds:.3g} seconds...")
time.sleep(seconds)
print("Done waiting - resume!")
raise Exception("Failed too many times")
return ret
return decorator This function can be used as follows: @rate_limited_retry(github)
def run_query(import_string):
query_string = f"language:Python \"{import_string}\""
return list(github.search_code(query_string))
results = run_query(import_string) |
Modified version of pokey's decorator above to take into account core/search/graphql. def rate_limited_retry():
def decorator(func):
def ret(*args, **kwargs):
for _ in range(3):
try:
return func(*args, **kwargs)
except RateLimitExceededException:
limits = gh.get_rate_limit()
print(f"Rate limit exceeded")
print("Search:", limits.search, "Core:", limits.core, "GraphQl:", limits.graphql)
if limits.search.remaining == 0:
limited = limits.search
elif limits.graphql.remaining == 0:
limited = limits.graphql
else:
limited = limits.core
reset = limited.reset.replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
seconds = (reset - now).total_seconds() + 30
print(f"Reset is in {seconds} seconds.")
if seconds > 0.0:
print(f"Waiting for {seconds} seconds...")
time.sleep(seconds)
print("Done waiting - resume!")
raise Exception("Failed too many times")
return ret
return decorator |
It seems the get_rate_limit function will return what Github considers the 'core' rate limit. However, there are different rate limits for searching code. See here.
Right now there isn't a way to get the search code rate limits as far as I can tell.
The text was updated successfully, but these errors were encountered: