Options for safely handling slow clients #604

jhump · 2023-10-10T20:27:49Z

Currently, slow clients can tie up server resources, as the server code is blocked on reading request data or writing response data. For server-to-server workloads, this is not usually a big concern. But when services are exposed to external, untrusted clients, it can pose an abuse vector. Malicious clients could exploit this to cause a DoS; non-malicious, slow clients (like mobile clients with very poor signal) could lead to operational issues, like excessive resource utilization.

The http.Server provides numerous options that Connect servers should use to help with abuse prevention -- in particular the ReadTimeout, ReadHeaderTimeout, and WriteTimeout fields.

However, these are often inappropriate for RPC traffic, which could be a mix of fast and slow endpoints. In particular, there are classes of streaming endpoints that intend to be long-lived, often with long delays in between messages (subscription/notification endpoints). Since the above configuration is server-wide, one must then choose one of:

Low timeouts. Long-lived stream clients are forced to constantly re-connect.
Long timeouts. Stream clients are properly accommodated, but now a slow client can abuse unary endpoints that intended to be fast.
Medium timeouts. This means a "middle of the road" timeout that provides a compromise that will work okay with both fast and long-lived endpoints. But it's still not ideal for either as it's going to be higher than ideal for fast endpoints and lower than ideal for long-lived ones.

An ideal solution is one where the timeout can be configured per endpoint. That way, long-lived streaming endpoints can be configured with a very long (or even no) timeout. Other endpoints can have a reasonably low timeout, which will allow a slow client to be detected and the request to fail faster.

Implementing this, however, requires the use of new-as-of-Go-1.20 http.ResponseController, which allows setting read and write deadlines on a per-request basis. Ideally, the Connect runtime would surface some configuration for per-endpoint timeouts and then apply them on a per-request basis using a ResponseController.

The text was updated successfully, but these errors were encountered:

jhump · 2023-11-29T16:36:32Z

To address this issue, we will likely need to add support for per-method timeouts that are specified via an option (which can apply different values to different methods via conditional options).

This new option can only work with http.ResponseController, which was introduced in Go 1.20. However, the connect-go runtime currently supports Go 1.19. So we would add this new functionality behind a build tag that is enabled with Go 1.20+ or do this after we have dropped support for 1.19.

Due to a recently-fixed panic bug (related to use of http.ResponseController.SetReadDeadline with HTTP/2 requests that have no body, like a GET) that won’t be fixed until Go 1.22+, we should be cautious about actually calling the SetReadDeadline method. It should only get called when the option has been used, so there’s no risk of triggering the above bug before Go 1.22 for users that don’t use the new option. Or we may set the build tag mentioned above so we only expose this functionality in Go 1.22+ (since it may not be safe to expose in Go 1.20 or Go 1.21). If we did allow the use of this feature with Go 1.20 or 1.21, we'd need to clearly document how to safely use it (by using golang.org/x/net/http2 v0.16.0+ in the server, instead of the bundled implementation in net/http -- which I think is as simple as using http2.ConfigureServer).

There are two possible uses for timeouts:

A total time limit for reading all of the request data and/or writing all of the response data.
A time limit that applies per read or write operation. For RPC, “operation” means reading or writing a message. So we might want a deadline that applies to reading or writing a message on the stream, and then the deadline resets. So the next attempt to read or write gets a new deadline. This allows messages on the stream to act like keep-alives, allowing an unbounded amount of time for the RPC as long as it is not idle (i.e. some data is being transferred on it).

For the former case, having separate read and write timeouts is not as useful since a single timeout is effectively establishing a bound on the entire RPC duration. But for the latter case, separate read and write timeouts is likely very useful: for a long-lived stream where logic is client-initiated, the write timeout (to send a message) might be much shorter than the read timeout (which must include the idle wait time between requests).

Initially, we think a new option that can satisfy the first case above is adequate. And we'd leave the door open for future options to accommodate the second case.

These options (or analogous ones) might be interesting as client-side options, too. So it might be nice to be able to declaratively define timeouts that are automatically applied to an outbound RPC. And for the second case above, it would effectively allow per-receive and per-send timeouts, which would also be nice to have in a streaming client.

jhump added the enhancement New feature or request label Oct 10, 2023

meling mentioned this issue Jan 3, 2024

Remove ResponseController when connect-go support has been added quickfeed/quickfeed#1019

Open

jhump mentioned this issue Jan 11, 2024

Todos for upcoming Go releases #356

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for safely handling slow clients #604

Options for safely handling slow clients #604

jhump commented Oct 10, 2023

jhump commented Nov 29, 2023

Options for safely handling slow clients #604

Options for safely handling slow clients #604

Comments

jhump commented Oct 10, 2023

jhump commented Nov 29, 2023