Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Control over Optimizing Transfer Rate #135

Open
djspiewak opened this issue Jun 26, 2020 · 3 comments
Open

Better Control over Optimizing Transfer Rate #135

djspiewak opened this issue Jun 26, 2020 · 3 comments
Assignees

Comments

@djspiewak
Copy link

This is closely related to #134

At present, multipart chunks within the HTTP response stream are set to 8 KiB. Additionally, maximum page sizes (after which an additional request must be made) are limited to 100 rows, which corresponds to about 2.5 MiB. All of this is much, much too small, and it would be very helpful to raise the former limit by around 2 orders of magnitude, and the latter by around 1.

My home internet connection tests around 250 Mbps downstream. Ignoring HTTP overhead (which is somewhat marginal), for me to saturate my bandwidth, I would need to receive 12.5 pages per second, which works out to one page every 80 milliseconds. A quick glance at traceroute suggests that ICMP propagation alone gives me a floor of one page every 30 seconds from your server, and an HTTPS connection and handshake process is considerably higher latency than this, even if we pretend that there is no per-page processing overhead on your server's side.

I think that a reasonable page limit would be around 25 MiB, which would ballpark to a limit=1000 or perhaps a little higher. These are still manageable page sizes from a server resource standpoint, but they would relieve some of the pressure on connection latency as the bounding factor in transfer rates.

Which brings us to chunk sizes… It would be considerably easier if the multipart chunk sizes were larger than just 8 KiB, simply because there's a meaningful amount of processing overhead which must surround each chunk (e.g. parsing the HTTP messages setting up the next chunk), which again cuts into transfer rates considerably. Using our application as a benchmark, we've found that the ideal chunk size for local filesystem access is usually around 1 MiB. This is fairly large for an HTTP response, but it's probably closer to the optimum overall. A multipart chunk size around 256 KiB would be easily manageable on any device, and it would relieve some of the processing penalties.

@dkrylovsb
Copy link
Collaborator

Thank you for bringing this to our attention. The page limit has been increased to 1000 as requested.

We are working with the API gateway that resides downstream and appears to be doing the chunking, so this one might take a little longer. I will post back once I know more.

@dkrylovsb dkrylovsb self-assigned this Jun 30, 2020
@GUI
Copy link

GUI commented Aug 5, 2020

@djspiewak: I work on the api.data.gov platform, which the OpenFDA platform utilizes and the layer where this chunk size is coming from. Thanks for this suggestion, although sorry for the delay in responding.

We've been looking into this, and the chunk size in our platform appears to stem from nginx's proxy_buffer_size setting. We don't fully buffer responses at our proxying layer (so we're streaming them from the FDA's API backend as quickly as we receive the pieces of the body), but this setting also appears to affect the chunk size sent along when streaming responses.

In terms of adjusting this setting, we want to be careful in changing this, since it would affect a variety of other APIs too. Do you have any additional details about the performance benefits you've seen from different chunk sizes and what type of API client you're using? We've been trying to find other information about tuning this setting in nginx, but we haven't found much on the internet about this being a common tuning setting or the performance implications of it. From what I've found, chunked encoding responses aren't typically a bottleneck in parsing bodies for clients, but if you are seeing this as an issue, we'd definitely like to know more about that. Since this setting is oriented around memory page sizes and also seems to affect the minimum amount of memory necessary to serve a request, we also just want to be careful about the potential impact of this on our systems. For example, I believe increasing this from 8KB to 256KB would mean that if we're serving 1,000 concurrent requests, that would increase the amount of RAM required on the server to hold the memory buffers for all the connections from 8MB to 256MB (although, I haven't fully tested this, so happy to be corrected). We'd be glad to talk about this further, since we'd like to for you all to have an optimal API experience, but again, we're just wanting to be careful about any changes that might impact things across our platform.

Thanks!

@djspiewak
Copy link
Author

Do you have any additional details about the performance benefits you've seen from different chunk sizes and what type of API client you're using?

Chunk size tuning is complicated! I don't blame you for being cautious.

So to lend a little more color to this… We're using https://github.com/http4s/http4s with the https://github.com/AsyncHttpClient/async-http-client backend. Which is a long-winded way of saying that it's a purely functional streaming system that ultimately uses Netty to handle the http/s legwork. The data processing is being performed by our own parsing layer (https://github.com/precog/tectonic).

In systems such as this, all of the overhead is at the chunk boundaries. This is for a number of reasons, but ultimately it gives a very bright delineation between the hot path and the "not hot path", which makes it possible to employ high-level abstractions and composability without impacting performance. The only assumption which has to be made is that the chunks are sufficiently large as to amortize the (substantial!) cost penalties being paid between each chunk. Usually this is true. For example, when we process data stored on-disk, we set our chunk sizes to 1 MiB, which appears to hit the sweet spot with modern L3 caches.

For reference, with absolutely optimal chunk sizes and without using any compression at all, on a local filesystem, we can see throughput approaching 1.2 Gbps with our system (which is right around bus saturation, since we also have to output the data). I haven't actually tested setting the chunk size to 1 byte, but the data rates would probably be in the Kbps at the most, given what we see in other benchmarks. That's the kind of impact it can have.

Obviously you wouldn't want to set the chunk size quite that high on an http server, but this kind of thing is worth taking into account. When you set the chunk size to be something relatively small, it means that the processing side also wraps it up in relatively small ByteBuffers, which in turn means that the overhead of managing each buffer gets amortized over a smaller set of bytes, which quadratically lowers the overall throughput.

I'm honestly not sure what the standard is here from an http server standpoint. And obviously it makes your resident set much much larger, so you certainly need to be careful. I wish I could give you more useful advice. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants