Kernel fine-tuning to sustain load #40

RaJiska · 2023-10-29T07:20:05Z

Hi,

I'd like to open a discussion regarding fck-nat used for a production-ready type of load. Currently the way it's configured might not be enough for such a load as I could not see kernel tweaking configuration in scripts. Unfortunately I am no expert in Kernel tweaking and am not aware of all the configurations that might be necessary, but here are a few that I can think of:

Kernel keeps track of active connections via conntrack, the conntrack table once filled might drop new connections:
- nf_conntrack_max which governs the maximum number of tracked connections (and optionally nf_conntrack_buckets for performances)
- nf_conntrack_tcp_timeout_* to a lower value than the default perhaps ?
Networking stack
- tcp_wmem, tcp_rmem, udp_wmem, udp_rmem which should probably be increased so it can support a higher load
- tcp_max_syn_backlog
Maximum number of file descriptors via fs.max-files which limit could be overflowed if there are too many connections

Perhaps some more could be added, but it'd be interesting to have different profiles available that might be used depending of the usage intended of fck-nat.

The text was updated successfully, but these errors were encountered:

AndrewGuenther · 2023-11-01T20:04:50Z

If you're seeing the kind of volume which would require these kernel tweaks, you're likely at a point where fck-nat cannot sustain you or NAT Gateway would be more reasonable. Here's my logic on that:

Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.

Instances with over 32vCPUs give you 50% of the advertised bandwidth[1]. The cheapest network optimized instance with 32vCPUs is a c6gn.8xlarge which maxes out at 25Gbps and costs ~$980 more per month to operate than NAT Gateway. You'd need to have a 21TB egress for that to break even with data transfer. So really this optimization is for people in that boat and if you're in that boat you're likely to want the availability and bandwidth (up to 100Gbps) guarantees that NAT Gateway provides.

I'm not saying I wouldn't accept contributions for this, just wanted to add some color as to why I haven't pursued this already.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

patrickdk77 · 2023-11-01T20:17:34Z

The two issues are unrelated. Tuning those variables and how much bandwidth you can push are not related in any way.

I could have a million idle tcp connections, or 1 connection that is maxing out my bandwidth.

Tuning the numbers, adjusted tcp timeout from 12hours to something more reasonable, and increasing the default number of connections the kernel will track are based on memory, not network speed.

AndrewGuenther · 2023-11-01T20:22:50Z

I understand they're unrelated, but I'm talking about likely use cases and how I've prioritized work. If you're utilizing a high number of connection you're likely utilizing higher bandwidth. Again, I'm not saying that I wouldn't accept contributions/tackle this work, I'm just giving my reasoning as to why it hasn't been done already and a disclaimer that if you're worried about a large number of connections, you should consider this information about bandwidth as well.

RaJiska · 2023-11-02T15:45:20Z

Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.

Thank you for this additional context, I was actually not aware of this limitation of 5Gbps per instance for internet-gateway bound network, it's really sneaky of them.

That said, I have encountered the case where a single of my instances (in a public subnet) would have its conntrack table entirely filled and dropping new connections, while being nowhere near the 5Gbps limitation. In this scenario, a fck-nat instance without kernel tuning would not have been able to sustain the load, and even less if this scenario had other instances.
In this case tuning kernel would really help but would also require more resources, especially in terms of memory, which probably would require at least a t4g.medium, or even a r7g.medium, which would have a similar hourly rate as NAT GW (excluding saving plans), but without the extra GB processing fee, which in this case, might be the bulk of the bill.

The intention behind this issue is more to open a discussion on the matter and perhaps establish a comprehensive list of settings that that would cover this case where fck-nat would need to handle a large number of connections without necessarily reaching its bandwidth limit.

philipg · 2023-11-09T20:33:42Z

you can avoid the 5gbps limit by sharding the public internet ip prefixes via CIDR deaggregation. i.e. multiple fck-nat NATs for a single VPC via route table manipulation.

RaJiska · 2023-11-10T02:42:33Z

@philipg To put simply, creating smaller private subnets, each with their own NAT instance? This would work but unfortunately requires changes to the networking layer just to accommodate this technical constraint, which is not ideal.

philipg · 2023-11-11T09:54:34Z

@RaJiska the other way around. sharding the public internet. multiple routes. so instead of 0.0.0.0/0 you split the internet address space up.

RaJiska · 2023-11-11T12:57:19Z

This is a clever trick. Thanks for sharing this idea.

RaJiska mentioned this issue Nov 14, 2023

Create a fck-nat Terraform module #4

Closed

AndrewGuenther added the enhancement New feature or request label Feb 9, 2024

RaJiska mentioned this issue Apr 10, 2024

ASG Refresh responce time? #80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel fine-tuning to sustain load #40

Kernel fine-tuning to sustain load #40

RaJiska commented Oct 29, 2023 •

edited

AndrewGuenther commented Nov 1, 2023

patrickdk77 commented Nov 1, 2023

AndrewGuenther commented Nov 1, 2023

RaJiska commented Nov 2, 2023

philipg commented Nov 9, 2023

RaJiska commented Nov 10, 2023

philipg commented Nov 11, 2023

RaJiska commented Nov 11, 2023

Kernel fine-tuning to sustain load #40

Kernel fine-tuning to sustain load #40

Comments

RaJiska commented Oct 29, 2023 • edited

AndrewGuenther commented Nov 1, 2023

patrickdk77 commented Nov 1, 2023

AndrewGuenther commented Nov 1, 2023

RaJiska commented Nov 2, 2023

philipg commented Nov 9, 2023

RaJiska commented Nov 10, 2023

philipg commented Nov 11, 2023

RaJiska commented Nov 11, 2023

RaJiska commented Oct 29, 2023 •

edited