Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow setting retry delay #1167

Open
1 task done
Guust-Franssens opened this issue Feb 20, 2024 · 7 comments
Open
1 task done

feat: allow setting retry delay #1167

Guust-Franssens opened this issue Feb 20, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@Guust-Franssens
Copy link

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

Currently, setting the retry delay for the _base client is not easily exposed. End-users have easy control over the max number of retries but not the max_retry_delay and initial retry delay.

MAX_RETRY_DELAY,
DEFAULT_MAX_RETRIES,
INITIAL_RETRY_DELAY,

Preferably these are exposed and easily settable similar to max_retries in the OpenAI class:

class OpenAI(SyncAPIClient):
completions: resources.Completions
chat: resources.Chat
embeddings: resources.Embeddings
files: resources.Files
images: resources.Images
audio: resources.Audio
moderations: resources.Moderations
models: resources.Models
fine_tuning: resources.FineTuning
beta: resources.Beta
with_raw_response: OpenAIWithRawResponse
with_streaming_response: OpenAIWithStreamedResponse
# client options
api_key: str
organization: str | None
def __init__(
self,
*,
api_key: str | None = None,
organization: str | None = None,
base_url: str | httpx.URL | None = None,
timeout: Union[float, Timeout, None, NotGiven] = NOT_GIVEN,
max_retries: int = DEFAULT_MAX_RETRIES,

Additional context

No response

@dackerman
Copy link

Thanks for the report! Ooc, what's your use-case for wanting to adjust those values?

@rattrayalex rattrayalex added the enhancement New feature or request label Feb 20, 2024
@charlyjazz-sprockets
Copy link

I also need this, I am using celery scheduling to implement it since the openai lib does not have this. My use-case is that openai api retries most of the time fail and fail and fail...so we dont like the idea to hit and hit and hit openai again again failing and failing, at least We would like to have a retry exponential/fibonacci or something like that to avoid hitting openai apis when they are slow/buggy/ which is a very very often state of the openai api state..

@rattrayalex
Copy link
Collaborator

rattrayalex commented Mar 6, 2024

Does it fail with 429 repeatedly? Or another error message? What kind of rate limit are you hitting?

(I ask because the client should be waiting long enough that you wouldn't hit a second 429, so if that's not happening, we need to adjust something).

@jflam
Copy link

jflam commented Apr 28, 2024

There are rate limits on Azure OpenAI that are based on the service tier that you are using. My specific scenario is using AOAI APIs in offline evaluation.

Here's the error:

openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-02-15-preview have exceeded call rate limit of your current AIServices S0 pricing tier. Please retry after 1 second. Please contact Azure support service if you would like to further increase the default rate limit.

This is where those constants are used:

_base_client.py:672:

        # Apply exponential backoff, but not more than the max.
        sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)

@rattrayalex
Copy link
Collaborator

@kristapratico in this scenario, can you confirm whether AOAI will respond with retry-after and/or retry-after-ms headers?

@kristapratico
Copy link
Contributor

@jflam is this using a PTU deployment? Upon 429, I would expect AOAI to return retry-after (and maybe retry-after-ms if this is PTU) which this library will honor. Are you not seeing those headers in the response?

@jflam
Copy link

jflam commented May 13, 2024

no it isn't. I eventually figured out that I needed to use the with_retry() method on the LLM to configure this behavior correctly. Since I did this I don't think that this is a problem anymore. I blame the docs for not making this clear. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants