Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for safely handling slow clients #604

Open
jhump opened this issue Oct 10, 2023 · 1 comment
Open

Options for safely handling slow clients #604

jhump opened this issue Oct 10, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@jhump
Copy link
Member

jhump commented Oct 10, 2023

Currently, slow clients can tie up server resources, as the server code is blocked on reading request data or writing response data. For server-to-server workloads, this is not usually a big concern. But when services are exposed to external, untrusted clients, it can pose an abuse vector. Malicious clients could exploit this to cause a DoS; non-malicious, slow clients (like mobile clients with very poor signal) could lead to operational issues, like excessive resource utilization.

The http.Server provides numerous options that Connect servers should use to help with abuse prevention -- in particular the ReadTimeout, ReadHeaderTimeout, and WriteTimeout fields.

However, these are often inappropriate for RPC traffic, which could be a mix of fast and slow endpoints. In particular, there are classes of streaming endpoints that intend to be long-lived, often with long delays in between messages (subscription/notification endpoints). Since the above configuration is server-wide, one must then choose one of:

  1. Low timeouts. Long-lived stream clients are forced to constantly re-connect.
  2. Long timeouts. Stream clients are properly accommodated, but now a slow client can abuse unary endpoints that intended to be fast.
  3. Medium timeouts. This means a "middle of the road" timeout that provides a compromise that will work okay with both fast and long-lived endpoints. But it's still not ideal for either as it's going to be higher than ideal for fast endpoints and lower than ideal for long-lived ones.

An ideal solution is one where the timeout can be configured per endpoint. That way, long-lived streaming endpoints can be configured with a very long (or even no) timeout. Other endpoints can have a reasonably low timeout, which will allow a slow client to be detected and the request to fail faster.

Implementing this, however, requires the use of new-as-of-Go-1.20 http.ResponseController, which allows setting read and write deadlines on a per-request basis. Ideally, the Connect runtime would surface some configuration for per-endpoint timeouts and then apply them on a per-request basis using a ResponseController.

@jhump jhump added the enhancement New feature or request label Oct 10, 2023
@jhump
Copy link
Member Author

jhump commented Nov 29, 2023

To address this issue, we will likely need to add support for per-method timeouts that are specified via an option (which can apply different values to different methods via conditional options).

This new option can only work with http.ResponseController, which was introduced in Go 1.20. However, the connect-go runtime currently supports Go 1.19. So we would add this new functionality behind a build tag that is enabled with Go 1.20+ or do this after we have dropped support for 1.19.

Due to a recently-fixed panic bug (related to use of http.ResponseController.SetReadDeadline with HTTP/2 requests that have no body, like a GET) that won’t be fixed until Go 1.22+, we should be cautious about actually calling the SetReadDeadline method. It should only get called when the option has been used, so there’s no risk of triggering the above bug before Go 1.22 for users that don’t use the new option. Or we may set the build tag mentioned above so we only expose this functionality in Go 1.22+ (since it may not be safe to expose in Go 1.20 or Go 1.21). If we did allow the use of this feature with Go 1.20 or 1.21, we'd need to clearly document how to safely use it (by using golang.org/x/net/http2 v0.16.0+ in the server, instead of the bundled implementation in net/http -- which I think is as simple as using http2.ConfigureServer).

There are two possible uses for timeouts:

  1. A total time limit for reading all of the request data and/or writing all of the response data.
  2. A time limit that applies per read or write operation. For RPC, “operation” means reading or writing a message. So we might want a deadline that applies to reading or writing a message on the stream, and then the deadline resets. So the next attempt to read or write gets a new deadline. This allows messages on the stream to act like keep-alives, allowing an unbounded amount of time for the RPC as long as it is not idle (i.e. some data is being transferred on it).

For the former case, having separate read and write timeouts is not as useful since a single timeout is effectively establishing a bound on the entire RPC duration. But for the latter case, separate read and write timeouts is likely very useful: for a long-lived stream where logic is client-initiated, the write timeout (to send a message) might be much shorter than the read timeout (which must include the idle wait time between requests).

Initially, we think a new option that can satisfy the first case above is adequate. And we'd leave the door open for future options to accommodate the second case.

These options (or analogous ones) might be interesting as client-side options, too. So it might be nice to be able to declaratively define timeouts that are automatically applied to an outbound RPC. And for the second case above, it would effectively allow per-receive and per-send timeouts, which would also be nice to have in a streaming client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant