Should global timeout override timeout in provided context? #1144

devchas · 2020-07-06T16:12:41Z

When calling ServerStreamingCallable.call(request, context) with a context set like so

ApiCallContext context = GrpcCallContext.createDefault();
context = context.withTimeout(Duration.of(10, ChronoUnit.HOURS));

the global timeout settings override the provided context. As a result, I cannot override global settings. Is this intended behavior, or is this a bug? If intended behavior, how can I override the global settings?

The text was updated successfully, but these errors were encountered:

devchas · 2020-07-06T20:07:38Z

Some more context after digging into this further. I've also tried altering the deadline/timeout by directly updating the call options with

ApiCallContext context = GrpcCallContext.createDefault().withCallOptions(CallOptions.DEFAULT.withDeadline(
    Deadline.after(10, TimeUnit.HOURS)));

It looks like after overriding the timeout as described in the previous comment, the deadline in the call options is updated using the global timeout value if the deadline calculated using that global timeout value comes before the explicitly provided deadline.

It seems like I could decrease the deadline using withCallOptions. I am wondering, is there any way to increase the deadline?

devchas · 2020-07-07T14:26:59Z

I shared some additional context offline, but for the benefit of anyone following this thread, this problem only seems to occur for a UnaryCallable of type RetryingCallable. When the UnaryCallable is of type ServerStreamingCallable, the timeout settings are not overridden by the global settings.

noahdietz · 2020-07-14T19:05:19Z

@igorbernstein2, per our discussion, if you could weigh in on the code in question, we'd appreciate it.

noahdietz · 2020-07-15T19:11:31Z

@nwbirnie so we continue discussion about this behavior as we debug the code.

noahdietz · 2020-07-17T00:59:27Z

Ok so there are two parts to this issue, if I'm understanding things:

A timeout set in a ApiCallContext provided by a user to the call(request, context) API is overwritten when a ServerStreamingAttemptCallable is in use (which is when there is a server streaming RPC that is has retries enabled). The code in question for that issue is in start() here.
When using CallOptions to supply a deadline (presumable with a context that will be supplied to the ServerStreamingAttemptCallable.call(request, context) API), the timeout on the ApiCallContext (which I'm guessing from (1) has already been overwritten to the totalTimeout value in the RetrySettings) is compared to the deadline provided via the CallOptions and the shorter of the two wins (in the specific use case with this API, the totalTimeout was 1H and the CallOptions contained a timeout of 10H). That code in question is here (and also in GrpcCallContext.withTimeout).

(1) can be resolved by changing the condition to respect whatever timeout is set on the context:

if (totalTimeout != null && context != null && context.getTimeout() == null) {
    context = context.withTimeout(totalTimeout);
}

However, allowing this then begs the question of (2). We'd need to add additional logic to check if the provided context timeout would expire sooner or later than the totalTimeout from the RetrySettings. Or maybe we wouldn't

(2) is essentially "should we allow users to increase a timeout via the ApiCallContext"

I think that it is reasonable to take a more restrictive approach from the ApiCallContext perspective, given that there is retry/timeout machinery that mutates the context higher in the stack (as is evident from (1)). Also, the ApiCallContext is (I believe) representative of a single RPC. In the context of a non-retryable RPC, there is only 1 RPC made, so respecting configuration in a given ApiCallContext makes sense. In the context of a retryable RPC though, multiple RPCs will be made, which are controlled by machinery around the ApiCallContext and respecting user-provided values could complicate things. However, given that there is no per-logical call API for customizing CallSettings on not only server streaming RPCs, but any method in a Java GAPIC at the moment, do we want to add more flexibility to ApiCallContext?

Context for discussion: we are designing the addition of overloads to set per-logical call CallSettings at the GAPIC level.

@igorbernstein2 can you weigh in on the questions above?

@devchas @nwbirnie please correct me if my summary was off.

igorbernstein2 · 2020-07-17T16:10:20Z

This is a bit messy and I apologize for my contributions to the messiness.

So if I understand correctly there are a few timeouts at play here:

CallOptions.withDeadline
GrpcApiCallContext.withDeadline
GrpcApiCallContext. withStreamWaitTimeout
RetrySettings.getTotalTimeout
RetrySettings.get*RetryTimeout

and they overlap and play different roles depending on the context

As we discussed I think this should be cleaned up where the user only controls a single timeout overall time and the rpc deadlines are derived from it. I also think users should be able to override the default deadline per call.

So I agree with solution 1 where the user's overrride is respected. Though I thinkfor the second problem, totalTimeout should be adjusted (including expanded) to the user's desires.

Ignoring backwards compatibility , I think the correct state would be:

ApiContext should add a new method #withTimeout() which should just proxy to CallOptions#withDeadline.
{Unary,ServerStreaming}CallableSettings should define a deadline configuration knob that will be stamped on ApiCallContext.withTimeout if the user hasn't provided their own value.
RetrySettings should only define if retries are enabled and what the exponential delay should be
RetryAlgorithm should be modified to use the incoming ApiCallContext/CallOptions deadline to know when it should stop scheduling retry attempts.
RetrySettings should drop all other fields
Add streamWaitTimeout to ServerStreamingCallableSettings

With backwards compatibility I think the transition can be:

Add ApiCallContext#withOverallTimeout which proxies to CallOptions#withDeadline
Use ApiCallContext#overallTimeout/CallOptions#deadline to override RetrySettings#totalTimeout in RetryAlgorithm
Introduce *CallableSettings#timeout and have both RetrySettings and *CallableSettings write to that field. Deprecate RetrySettings#totalTimeout
Make RetrySettings#rpcTimeouts no-ops
Add ServerStreamingCallableSettings#streamWaitTimeout

noahdietz · 2020-07-20T16:33:09Z

Ok, so the future state of exposing a new per-call Settings API is in the near/medium future. The ideal end states you've described here are valuable. There is generator work and a design to write up for this.

Immediately, can I make the change for (1) - respecting a user-provided timeout on an ApiCallContext in ServerStreamingAttemptCallable (and potentially elsewhere if discovered)? I'm trying to differentiate between what might be non-ideal behavior now that we can fix, and what we want to do to simplify/align things overall. If that is not possible, or not advisable, please let me know.

noahdietz · 2020-07-24T18:03:07Z

Ok, so #1155 fixed (1) - an ApiCallContext with a caller-provided timeout (i.e. context.withTimeout()) given to a ServerStreamingAttemptCallable (the concrete class used when calling a Server Streaming RPC that is configured to be retryable) will be retained, rather than overwritten.

I discussed with @igorbernstein2 offline if the same change should be made to AttemptCallable (used for retryable Unary RPCs). As mentioned in #1155 (comment), we are punting on that for now because the context timeout is handled slightly differently in that case. We want to address it with other work we are doing to simplify timeouts in gax-java, as well as the future plan of exposing GAPIC-level CallSettings method overloads.

As for (2) - with ServerStreamingAttemptCallable, a caller-provided context should be able to lengthen the timeout if provided via context.withTimeout, as the call-provided timeout is respected w/o comparison to the settings.totalTimeout. I did not address gRPC CallOption resolution with the aforementioned PR. That is a more general change we will need to do separately if agreed upon. But the specific use case here is addressed by fixing (1).

I downgraded the ticket, because we were able to work around it for the specific use case, probably should've done that before.

I kept this open with status: investigating because I think there are some questions that came out of investigating the specific issue, but I'm not opposed to closing this in favor of a new issue that pertains to exactly what we want to address going forward. Thoughts?

miraleung · 2020-11-03T19:09:16Z

Moving this to P3 until ndietz@ is back in Q1.

noahdietz · 2021-03-09T17:26:33Z

#1324 will fix this specific issue by respecting a timeout set via withTimeout. I am still going to be looking at how we can improve the configurability of timeouts at the method level (i.e. logical timeout). Also, #1238 will improve the accessibility of the RetrySettings by making them configurable from the ApiCallContext. That said, the timeouts in RetrySettings are not trivial to use, especially when compared to withTimeout.

miraleung self-assigned this Jul 6, 2020

yoshi-automation added the triage me I really want to be triaged. label Jul 7, 2020

miraleung assigned noahdietz Jul 7, 2020

miraleung added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels Jul 7, 2020

noahdietz added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Jul 15, 2020

noahdietz mentioned this issue Jul 21, 2020

fix: retain context timeouts in ServerStreamingAttemptCallable #1155

Merged

jradcliff mentioned this issue Aug 24, 2020

DeadlineExceededException googleads/google-ads-java#327

Closed

yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Oct 22, 2020

This was referenced Mar 9, 2021

feat: support retry settings and retryable codes in call context #1238

Merged

fix: retain user RPC timeout if set via withTimeout #1324

Merged

noahdietz removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Mar 9, 2021

noahdietz closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should global timeout override timeout in provided context? #1144

Should global timeout override timeout in provided context? #1144

devchas commented Jul 6, 2020

devchas commented Jul 6, 2020

devchas commented Jul 7, 2020

noahdietz commented Jul 14, 2020

noahdietz commented Jul 15, 2020

noahdietz commented Jul 17, 2020

igorbernstein2 commented Jul 17, 2020

noahdietz commented Jul 20, 2020 •

edited

noahdietz commented Jul 24, 2020

miraleung commented Nov 3, 2020

noahdietz commented Mar 9, 2021

Should global timeout override timeout in provided context? #1144

Should global timeout override timeout in provided context? #1144

Comments

devchas commented Jul 6, 2020

devchas commented Jul 6, 2020

devchas commented Jul 7, 2020

noahdietz commented Jul 14, 2020

noahdietz commented Jul 15, 2020

noahdietz commented Jul 17, 2020

igorbernstein2 commented Jul 17, 2020

noahdietz commented Jul 20, 2020 • edited

noahdietz commented Jul 24, 2020

miraleung commented Nov 3, 2020

noahdietz commented Mar 9, 2021

noahdietz commented Jul 20, 2020 •

edited