Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

feat: make stream wait timeout a first class citizen [WIP] #1409

Closed
wants to merge 4 commits into from

Conversation

igorbernstein2
Copy link
Contributor

For a server streaming api, there are 4 conceptual timeouts:

  1. overall operation timeout - the maximum amount time that passes from the user invoking a method until that method exits
  2. attempt/rpc timeout - if retries are enabled the maximum amount of time that passes for each attempt in an operation
  3. message wait timeout - the maximum amount of time to wait for the next message from the server
  4. idle timeout - how long to wait before considering the stream orphaned by the user and closing it

Each has a usecase:

  1. operation timeout is useful for users to fulfill their own slo guarantees
  2. attempt timeout are useful when a client developer knows the absolute limit of an rpc but that limit happens to be shorter than the required slo for a customer. For example a point read of a bigtable key shouldnt ever take more than 100ms. If it does, then we can assume something is wrong (GFE died without sending a FIN packet) and abort the request and retry.
  3. message wait timeouts have a similar use as attempt timeouts with tighter guarantees
  4. idle timeouts are useful to reduce buffer bloat on the server

Currently all of them are implemented in gax but the delineation was muddied by me a while back. This PR tries to fix the situation. In the current world, operation timeout is defined by RetrySettings#totalTimeout, idle timeout is defined by ServerStreamingCallSettings#idleTimeout. However RetrySettings#rpcTimeout is mapped to message wait timeout and attempt timeout is only configureable per call using ApiCallContext#withTimeout.

This PR cleans up the situation by mapping RetrySettings#rpcTimeout to attempt timeouts and exposes a new setting for wait timeout on ServerStreamingCallSettings.

For a server streaming api, there are 4 conceptual timeouts:
1. overall operation timeout - the maximum amount time that passes from the user invoking a method until that method exits
2. attempt/rpc timeout - if retries are enabled the maximum amount of time that passes for each attempt in an operation
3. message wait timeout - the maximum amount of time to wait for the next message from the server
4. idle timeout - how long to wait before considering the stream orphaned by the user and closing it

Each has a usecase:
1. operation timeout is useful for users to fulfill their own slo guarantees
2. attempt timeout are useful when a client developer knows the absolute limit of an rpc but that limit happens to be shorter than the required slo for a customer. For example a point read of a bigtable key shouldnt ever take more than 100ms. If it does, then we can assume something is wrong (GFE died without sending a FIN packet) and abort the request and retry.
3. message wait timeouts have a similar use as attempt timeouts with tighter guarantees
4. idle timeouts are useful to reduce buffer bloat on the server

Currently all of them are implemented in gax but the delineation was muddied by me a while back. This PR tries to fix the situation. In the current world, operation timeout is defined by RetrySettings#totalTimeout, idle timeout is defined by ServerStreamingCallSettings#idleTimeout. However RetrySettings#rpcTimeout is mapped to message wait timeout and attempt timeout is only configureable per call using ApiCallContext#withTimeout.

This PR cleans up the situation by mapping RetrySettings#rpcTimeout to attempt timeouts and exposes a new setting for wait timeout on ServerStreamingCallSettings.
@igorbernstein2 igorbernstein2 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jun 17, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 17, 2021
@igorbernstein2 igorbernstein2 changed the title feat: make stream wait timeout a first class citizen feat: make stream wait timeout a first class citizen [WIP] Jun 17, 2021

if (totalTimeout != null && context != null && context.getTimeout() == null) {
context = context.withTimeout(totalTimeout);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing this block, if someone set totalTimeout but not rpc timeout, the stream won't have a deadline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this already covered by #1191?

@igorbernstein2
Copy link
Contributor Author

I think there is too much risk with remapping rpcTimeout from response timeout to attempt timeout. I'm going to close this for now. If gax ever decided to do a major version bump, we can re-consider this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement. do not merge Indicates a pull request not ready for merge, due to either quality or timing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants