Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP: Automatic disconnection after handling N requests #4466

Open
NAJ10 opened this issue Sep 16, 2023 · 5 comments
Open

HTTP: Automatic disconnection after handling N requests #4466

NAJ10 opened this issue Sep 16, 2023 · 5 comments

Comments

@NAJ10
Copy link

NAJ10 commented Sep 16, 2023

We have a load test where by requests are sent to a deployment of pods in kubernetes. During the test the deployment is restarted. The load test is using shared connections. After the pods are restarted the load is not evenly distributed amongst the pods. The pods that have been alive the longest receive much more load (more than twice the load) than the youngest ones.

When we tried the same load test without shared connections we found the issue of too many sockets in the pod running the gatling load test being in TIME_WAIT state and the gatling load test was unable to issue requests to the pods running in kubernetes.

A feature to be able to control shared connections such that they disconnected after N requests or disconnected after a configurable period of time e.g. 1 minute would allow load to be more evenly distributed by forcing a load balancer decision more frequently rather than permitting a persistent connection to continue to send requests to an over loaded pod. There is some prior art in this area. See line/armeria#203

@slandelle
Copy link
Member

Hello,

Several questions:

  1. How do your actual user-agents behave? Are you sure that you're not trying to tweak the load test configuration so that you can get the best numbers but are actually creating unrealistic conditions that only exist in the load test? Does your user-agents really implement the mechanism you're suggesting? You might want to check https://www.youtube.com/watch?v=RiM1GsVSbzM
  2. The one owning the responsibility of the connections durations should be the server, not the client. Servers can't trust clients to behave as expected. IMO, your server should implement the standard keep-alive directives.
  3. In case you really want this feature, is this something your company would be willing to contribute or sponsor?

Regards

@slandelle slandelle changed the title Feature: Automatic disconnection after handling N requests HTTP: Automatic disconnection after handling N requests Sep 16, 2023
@NAJ10
Copy link
Author

NAJ10 commented Sep 16, 2023

Our actual user agents would open a connection, send a single http request and close the connection. So the more realistic way of running the simulation is to not use sharedConnections but when we do this the pod running the gatling load test runs outs of ephemeral ports due to sending a few hundred requests per second to the deployed pods We have several tens of thousands of connections in TIME_WAIT state. A few hundred requests per second is quite high but is in keeping with the load we might receive if there is a high takeup from the end user. In the original comment when I raised the issue I presented a simplified version of what we were actually doing in the load test. It is of course more complicated than originally presented in that we have two sets of http requests. One set goes to the deployed pods mention previously and a 2nd set of requests goes to a 2nd HTTP service. These 2nd set of requests are to assist in the load test of the deployed set of pods but the load reaches the deployed set of pods via the 2nd HTTP service. Ideally we would need the ability to use a HTTP connection pool that had shared connections for the 2nd HTTP service whilst requests going to the original set of pods would need to send a single request per connection but as this causes too many sockets in a TIME_WAIT state a useful work around for our tests would be to obtain a new connection after N requests.

@slandelle
Copy link
Member

slandelle commented Sep 18, 2023

The original feature request of having a connection max lifetime or request count on a per scenario/protocol basic could maybe make sense for multiple Gatling users (traction?) and could be eligible for merging.

But what you're now describing is very unlikely to be considered:

  • it seems very specific
  • no offense, but it's workaround so that you can save on the necessary resources for your test, at the cost of making the test not realistic.

Also, please note that even if you were to have a maxLifetime and maxRequests on the connection pool, you'd still have to make sure that the connections don't all expire at the same time, which would result in a SYN flood.
Again it looks to me that this mechanism would be best implemented server side.

@NAJ10
Copy link
Author

NAJ10 commented Sep 20, 2023

We have further realised that we are able to control the max connection lifetime in our services when they make calls to other services by using the maxConnectionLifetime setting in https://www.playframework.com/documentation/2.8.x/JavaWS#Configuring-AsyncClientConfig and we are going to change our services so they use this setting when they call other services. So it would be good to be able to have a similar setting in Gatling HTTP Client so we can have performance tests that behave in a similar way (with respect to connection lifetime) to how the services will be called by other services.

@slandelle
Copy link
Member

Technically possible.

That only leaves my last question unanswered:

  1. In case you really want this feature, is this something your company would be willing to contribute or sponsor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants