Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you calculate your retry budget? #946

Open
rafatbiin opened this issue Mar 27, 2023 · 1 comment
Open

How do you calculate your retry budget? #946

rafatbiin opened this issue Mar 27, 2023 · 1 comment

Comments

@rafatbiin
Copy link

I was reading: https://finagle.github.io/blog/2016/02/08/retry-budgets/ and came across the default number of minRetriesPerSec and percentCanRetry . as I understand this number can vary from service to service. how do you calculate these two numbers with the following objective in mind?

  1. Your retry budget should be relaxed enough that it shouldn't block retries in a normal scenario.
  2. Your retry budget should be strict enough that it will safeguard against a retry storm.
@csaltos
Copy link
Contributor

csaltos commented Mar 19, 2024

The values depends on your case, the size of servers, the number of connections and a lot of factors, normally you start with some conservative numbers and then you test the performance of your system an tune accordingly.

As a reference this is the configuration we are using at my company:

import com.twitter.conversions.DurationOps._
import com.twitter.finagle.Backoff
import com.twitter.finagle.Http
import com.twitter.finagle.ServiceFactory
import com.twitter.finagle.http
import com.twitter.finagle.service.ReqRep
import com.twitter.finagle.service.ResponseClass
import com.twitter.finagle.service.ResponseClassifier
import com.twitter.finagle.service.RetryBudget
import com.twitter.util.Duration
import com.twitter.util.Future
import com.twitter.util.Return
import com.twitter.util.StorageUnit
import com.twitter.util.Timer

val host = "test.com"
val url = "https://test.com/test1"
val totalRequestTimeout = 5.seconds
val referenceTimeout =
    Duration.fromMilliseconds(
      Math.max(1L, totalRequestTimeout.inMillis / 5L - 100L)
    )
initialRequestTimeout =
    Duration.fromMilliseconds(referenceTimeout.inMillis * 2L)
val retryRequestTimeout =
    Duration.fromMilliseconds(referenceTimeout.inMillis * 3L)
val maxResponseSizeInBytes = 10000000
val clientFactory = Http.client
      .withRequestTimeout(initialRequestTimeout)
      .withRetryBudget(RetryBudget())
      .withRetryBackoff(Backoff.exponentialJittered(1.second, backoff))
      .withResponseClassifier(
        customResponseClassifierOnErrors orElse http.service.HttpResponseClassifier.ServerErrorsAsFailures
      )
      .withMaxResponseSize(
          StorageUnit.fromBytes(maxResponseSizeInBytes)
        )
      .withTls(host)
val client = clientFactory.newClient("test.com:443")
val requestBuilder = http
      .RequestBuilder()
      .url(url)
      .addHeader(
        http.Fields.UserAgent,
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0"
      ).addHeader(http.Fields.Host, host)

val request = requestBuilder.buildGet()
val response = httpClient.toResponse(request)
response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants