Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel fine-tuning to sustain load #40

Open
RaJiska opened this issue Oct 29, 2023 · 8 comments
Open

Kernel fine-tuning to sustain load #40

RaJiska opened this issue Oct 29, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@RaJiska
Copy link
Contributor

RaJiska commented Oct 29, 2023

Hi,

I'd like to open a discussion regarding fck-nat used for a production-ready type of load. Currently the way it's configured might not be enough for such a load as I could not see kernel tweaking configuration in scripts. Unfortunately I am no expert in Kernel tweaking and am not aware of all the configurations that might be necessary, but here are a few that I can think of:

  • Kernel keeps track of active connections via conntrack, the conntrack table once filled might drop new connections:
    • nf_conntrack_max which governs the maximum number of tracked connections (and optionally nf_conntrack_buckets for performances)
    • nf_conntrack_tcp_timeout_* to a lower value than the default perhaps ?
  • Networking stack
    • tcp_wmem, tcp_rmem, udp_wmem, udp_rmem which should probably be increased so it can support a higher load
    • tcp_max_syn_backlog
  • Maximum number of file descriptors via fs.max-files which limit could be overflowed if there are too many connections

Perhaps some more could be added, but it'd be interesting to have different profiles available that might be used depending of the usage intended of fck-nat.

@AndrewGuenther
Copy link
Owner

If you're seeing the kind of volume which would require these kernel tweaks, you're likely at a point where fck-nat cannot sustain you or NAT Gateway would be more reasonable. Here's my logic on that:

Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.

Instances with over 32vCPUs give you 50% of the advertised bandwidth[1]. The cheapest network optimized instance with 32vCPUs is a c6gn.8xlarge which maxes out at 25Gbps and costs ~$980 more per month to operate than NAT Gateway. You'd need to have a 21TB egress for that to break even with data transfer. So really this optimization is for people in that boat and if you're in that boat you're likely to want the availability and bandwidth (up to 100Gbps) guarantees that NAT Gateway provides.

I'm not saying I wouldn't accept contributions for this, just wanted to add some color as to why I haven't pursued this already.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

@patrickdk77
Copy link

The two issues are unrelated. Tuning those variables and how much bandwidth you can push are not related in any way.

I could have a million idle tcp connections, or 1 connection that is maxing out my bandwidth.

Tuning the numbers, adjusted tcp timeout from 12hours to something more reasonable, and increasing the default number of connections the kernel will track are based on memory, not network speed.

@AndrewGuenther
Copy link
Owner

I understand they're unrelated, but I'm talking about likely use cases and how I've prioritized work. If you're utilizing a high number of connection you're likely utilizing higher bandwidth. Again, I'm not saying that I wouldn't accept contributions/tackle this work, I'm just giving my reasoning as to why it hasn't been done already and a disclaimer that if you're worried about a large number of connections, you should consider this information about bandwidth as well.

@RaJiska
Copy link
Contributor Author

RaJiska commented Nov 2, 2023

Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.

Thank you for this additional context, I was actually not aware of this limitation of 5Gbps per instance for internet-gateway bound network, it's really sneaky of them.

That said, I have encountered the case where a single of my instances (in a public subnet) would have its conntrack table entirely filled and dropping new connections, while being nowhere near the 5Gbps limitation. In this scenario, a fck-nat instance without kernel tuning would not have been able to sustain the load, and even less if this scenario had other instances.
In this case tuning kernel would really help but would also require more resources, especially in terms of memory, which probably would require at least a t4g.medium, or even a r7g.medium, which would have a similar hourly rate as NAT GW (excluding saving plans), but without the extra GB processing fee, which in this case, might be the bulk of the bill.

The intention behind this issue is more to open a discussion on the matter and perhaps establish a comprehensive list of settings that that would cover this case where fck-nat would need to handle a large number of connections without necessarily reaching its bandwidth limit.

@philipg
Copy link

philipg commented Nov 9, 2023

you can avoid the 5gbps limit by sharding the public internet ip prefixes via CIDR deaggregation. i.e. multiple fck-nat NATs for a single VPC via route table manipulation.

@RaJiska
Copy link
Contributor Author

RaJiska commented Nov 10, 2023

@philipg To put simply, creating smaller private subnets, each with their own NAT instance? This would work but unfortunately requires changes to the networking layer just to accommodate this technical constraint, which is not ideal.

@philipg
Copy link

philipg commented Nov 11, 2023

@RaJiska the other way around. sharding the public internet. multiple routes. so instead of 0.0.0.0/0 you split the internet address space up.

@RaJiska
Copy link
Contributor Author

RaJiska commented Nov 11, 2023

This is a clever trick. Thanks for sharing this idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants