Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression related to new TCP connection tracking feature with a high number of connections #3759

Open
henningw opened this issue Feb 19, 2024 · 4 comments
Labels

Comments

@henningw
Copy link
Contributor

henningw commented Feb 19, 2024

Description

On systems with a high number of TCP sessions there can be a significant performance regression observed, probably related to the newly added TCP connection tracking feature.

Troubleshooting

Reproduction

No special configuration is necessary, just install the latest 5.7.x release, e.g. 5.7.3 on a production system with a lot of clients connected over TCP or TLS. You need to have a large number of clients connected to be able to observe the regression. For a high number of connections (e.g. more then 20.000 up to 30.000 connections) the Kamailio servers uses about 30% to 50% more CPU as with the old version.

Debugging Data

Two graphs were attached to this issue. The first shows the CPU load before (less load) and after the upgrade (increased load). The second is a flamegraph that shows that over 80% of the CPU time is spent in the newly added function tcp_connection_limit_srcip().

cpu-load-before-after [graph](https://github.com/kamailio/kamailio/files/14336952/graph.pdf)

Most of the CPU time is spend in the TCP main process, as expected.

Log Messages

No special log messages could be observed.

SIP Traffic

Possible Solutions

The TCP limit feature should probably be optimized to not cause such a large performance regression. It should be also possible to deactivate it completly and therefore getting a comparable performance as before the feature addition.

Additional Information

Kamailio 5.7.3 and probably also git master version.

  • Operating System:

Debian 11, Debian 12

@henningw henningw added the bug label Feb 19, 2024
@henningw
Copy link
Contributor Author

PDF version of flamegraph: https://github.com/kamailio/kamailio/files/14336952/graph.pdf

@henningw
Copy link
Contributor Author

@miconda
Copy link
Member

miconda commented Feb 20, 2024

Is this performance penalty only observed because of new connections being done at a high rate, or even after the large number of connections is established? The limit should be enforced only when accepting new connection, if it does in other cases, it is then some mistake.

Disabling this limit checking is indeed good to have, I pushed a commit so in case the parameter is set to 0 or negative value, the limiting is no longer done (e3e6fd7). The commit can be backported if you have the chance to test it and works as expected.

@henningw
Copy link
Contributor Author

Thank you for the commit, will have it tested soon. Regarding the performance regression question, one one system where we observed it there are about 35.000 current TCP connections and a new connection rate about 85 connections per second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants