New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Citrine to Ubuntu 22.04 performance issues #8038
Comments
@vietj too (same applies, although with some smaller effects to Netty): I haven't verified yet with other frameworks TBH |
I'm seeing some pretty good consistency between new runs on the new environment for the top performers and spot checking some others. I'm not yet convinced it's the environment that's unstable. I believe Redkale was the framework I saw that was changing snapshots between runs so that they didn't have to alter the version number in our repo, so frankly, I don't trust the variance we're seeing here from them. |
And what about Vertx and Netty or Helidon Nima? I am not sure that observing the very top performer is the right strategy TBH, because it depends how much they were constrained by the NIC vs CPU resources, meaning that a CPU or memory penalty will still leave them enough room to max the out the NIC and appear as they didn't received any hit. |
Netty and Helidon Nima both are consistent across runs in the new env. So, I don't believe the new environment is "unstable." However, it could be that some part of the update has affected the performance of some frameworks that could have been uniquely tuned to the previous environment. Do you have any insight into what that might be? |
Sorry, with Netty I was wrong, but Helidon Nima has received an hit and Vertx the same. They didn't have any specific optimization at socket/env or OS level - no affinity (I was involved into the vertx one given that I am responsible of it with Julien Viet). The sole thing that make them similar is that they are CPU bottlenecked (and can be observed in the dstat metrics), meaning that even a small noise there can cause perf variations. |
But you mean from the old environment to the new environment? Because in two runs on the new environment, helidon nima is 3,584,007 and 3,556,068 which is normal variance as far as I'm concerned. |
Yep, exactly, and for vertx the same - the old env vs new env numbers. |
Ok, yes. I agree with this. It seems like there's been a small hit (some large ones) across the board for a lot of frameworks fairly consistently. I'm happy to relay some information from the environments to help people track down what that might be. |
Many thanks! The only thing that come in my mind related both Vert-x and Nima (built on 2 different stacks) is that they are both CPU limited and are overcommiting - the former by using twice the number of event loops vs number of available cores, the other by using a rightly sized Fork Join thread pool but some additional competing and busy garbage collection threads. |
Let me think if adding some profiling could be of any help to see what's going on or even checking how the CPU usage of top performers has changed if they were not running at 100% CPU before - and looking at https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=plaintext&round=false I see that they were not maxing out CPU. If they didn't changed CPU usage it could be some "noisy neighbor" while if they see an augmented CPU usage (it should include irq processing into it, afaik) to deliver the same performance, then profiling can be of some help, because is something visible at application level. I am opened to ideas here :) |
@franz1981 The reason for the poor performance of the redkale framework this time is that http header coding is enabled, not because of the change of the environment. |
Thanks @redkale for the comment. I see instead that graal version is performing better so maybe there the lazy loading is stil happening? |
@nbrady-techempower
There is a chance that Vert-x is regressed as others by the mentioned percentage, but Netty isn't because it has balanced its regression with the improvement provided by io_uring (hence both would have regressed if they use Epoll). |
For completed continuous run you can check the detail where it shows you link for result where log is contained. |
Many thanks @fakeshadow didn't know that so... |
I'm seeing strange things happening for fortunes as well. drogon and asp.net core consistently had about 30% more rps before the upgrade. |
Thanks @NinoFloris I am working with my team to provide some regression analysis across latest nightly for all frameworks (will work on it next week) so we can help @nbrady-techempower to detect how many have regressed and by how much (and if there are relevant changes between the nightly) |
If you check the top performers, it is apparent that there is a performance regression in all tests except cached queries and plaintext - the latter two still being bottlenecked by the 10 Gb/s network speed. The regression is 5-7% for the JSON serialization, multiple queries, and database updates tests, and 14-17% for the single query and fortunes ones; overall, a 9% reduction in composite score. |
@redkale Looking the last results it appears you are not performing http request headers decoding anymore again. |
The last time http header parsing was enabled, it was for comparison. For simple http, it is disabled by default |
@redkale As far as I can understand TFB requirement number 2, the HTTP should be minimal, but realistic. For instance, any minimal HTTP server should properly react to a |
As commented by @nbrady-techempower in #7321 (comment) here we are!
I don't have a clue why/what's going on, but let's take a look at both vertx and redkale before/after the upgrade, with Plaintext:
before:
and @redkale, last run:
and both vert.x and redkale in another nightly with the new env:
It appears clear to me that the new env is quite unstable for some top performers and given the discussion at #7984 (reply in thread) where RedKale isn't performing any http request header decoding and hasn't changed version across the different runs, something wrong is going on here...
The text was updated successfully, but these errors were encountered: