No longer accepting plaintext only frameworks / Limited number of tests mutations #8420

NateBrady23 · 2023-09-14T14:19:29Z

Hi everyone!

As the number of new frameworks submitted to the benchmarks grows, the amount of time it takes to complete a full run does as well. Because of this, we will be implementing the following rules:

New frameworks that only implement plaintext will no longer be accepted. Of course, we'd like all frameworks to implement all tests to get a better idea of performance in various areas of the framework, but we expect at least 2 different tests to be implemented. Ideally plaintext or json and one db test.
The number of test mutations will be limited to 10. We do not mind if you open up pull requests between runs to try out various mutations for your framework so long as the total number at any given time does not exceed 10.

After the next round, we will ping framework maintainers to make these changes. We will also look to remove tests that are older and no longer maintained.

Thank you!

fakeshadow · 2023-09-14T14:34:14Z

Rules like these show how popular the project is and I agree with both.
On top of it I suggest composite score being calculated per mutation which would offer a quick view of per mutation detail.

gi0baro · 2023-09-15T17:00:44Z

@nbrady-techempower on the number of mutations I proposed #8055 some time ago but then left it given the community feedbacks. It might be worth a while re-check it

joanhey · 2023-09-17T17:14:42Z

I like it a lot, but it exist a problem long ago.
At the moment the framework is removed, all the history of the framework will disappear from the Rounds.

Like I said before the Rounds need to be immutable.
For example, in PHP we need to change the name because was php5, after the change plain PHP don't appear in the old Rounds.
We have the numbers, and the work done, but don't show in the Rounds.

otrosien · 2023-10-26T17:21:51Z

One framework to remove: Baratine. The domain baratine.io is not registered to the project anymore (careful, clickbait!), and the github project has last changes 7 years ago (https://github.com/baratine/baratine)

joanhey · 2023-10-26T17:52:42Z

In reality Baratine is marked as Stripped.

Why not bypass all the stripped frameworks from the runs ??

https://github.com/search?q=repo%3ATechEmpower%2FFrameworkBenchmarks+%5C%22Stripped%5C%22+OR+%5C%22stripped%5C%22+path%3A%2F%5Eframeworks%5C%2F%2F+repo%3ATechEmpower%2FFrameworkBenchmarks&type=code

fakeshadow · 2023-10-26T19:43:44Z

In reality Baratine is marked as Stripped.

Why not bypass all the stripped frameworks from the runs ??

https://github.com/search?q=repo%3ATechEmpower%2FFrameworkBenchmarks+%5C%22Stripped%5C%22+OR+%5C%22stripped%5C%22+path%3A%2F%5Eframeworks%5C%2F%2F&type=code

I disagree. In xitca-web Stripped bench is used to avoid polluting the default leaderboard while keep perf tracking of low level system software like OS and lang(and/or program) runtime at the same time. In fact Stripped is a fairly arbitrary category because there are even more unrealistic bench marked as Realistic. Unless there is a unified standard to determine what bench must be Stripped or not it's unfair to bypass them.

joanhey · 2023-10-26T23:26:51Z

@fakeshadow Ok.
I'm happy that is useful this information.

And about the what need to be Stripped, I think that it's a work of all the devs here, help to clarify the requirements and also to identify the frameworks than bypass these requirements.

fakeshadow · 2023-10-27T08:53:35Z

@fakeshadow Ok. I'm happy that is useful this information.

And about the what need to be Stripped, I think that it's a work of all the devs here, help to clarify the requirements and also to identify the frameworks than bypass these requirements.

Unfortunately the meaning of "Realistic" is subjective and from the existing bench code it's clear we have very divided opinions among bench maintainers. Therefore I doubt a common ground can be reached easily.
Actually I'm fine with the current configuration where the category is up to the maintainers to decide. When people look into the code and figure it out they would know which framework and it's community share the same opinion.
In other word as long as stripped bench can run in non official bench I personally find it's fine. As for broken(or outdated) bench I believe we can use “broken” tag to stop them from hogging resources in runs.

billywhizz · 2023-12-08T06:41:29Z

one thing i have been thinking is not quite fair is to combine results from different framework mutations together into the composite score. surely composite score should reflect a single configuration and that configuration's performance across all benches?

for example, if we look at ntex, which was top of the last official round, the different flavours get wildly different scores across the different benchmarks. is it fair to pick the best mutation in each category and combine those for composite? is it even possible to run a single service on ntex which would score highly across all benches? it doesn't seem so, but this is surely what the composite score should be measuring?

maybe a better system would be to sum up the scores across all benchmarks for a particular mutation and then, for each framework, choose the mutation that got the best composite score?

maybe this has been raised before. sorry for bringing it up again if so.

fakeshadow · 2023-12-08T14:43:15Z

one thing i have been thinking is not quite fair is to combine results from different framework mutations together into the composite score. surely composite score should reflect a single configuration and that configuration's performance across all benches?

for example, if we look at ntex, which was top of the last official round, the different flavours get wildly different scores across the different benchmarks. is it fair to pick the best mutation in each category and combine those for composite? is it even possible to run a single service on ntex which would score highly across all benches? it doesn't seem so, but this is surely what the composite score should be measuring?

maybe a better system would be to sum up the scores across all benchmarks for a particular mutation and then, for each framework, choose the mutation that got the best composite score?

maybe this has been raised before. sorry for bringing it up again if so.

I agree with you on the composite score issue. Besides incompatible features it's a common practice in the bench that frameworks implement low level json and/or plaintext to boost their composite score which is questionable to say at least.

Speaking of ntex from what I see the current bench has to choose one async runtime which means it's tokio or async-std flavor scores can't be achieved at the same time. That said it's possible to modify the code to combine multiple runtimes and get the best of them which would be a big refactor but it can be done.

MarkReedZ · 2024-03-26T05:14:51Z

Should we remove frameworks like gnet? It only implements plaintext and isn't actually doing any parsing / routing - it just scans to the \r\n\r\n and sends a canned response which doesn't meet the test requirements.

remittor · 2024-05-24T17:40:08Z

@MarkReedZ , your project also has bugs:
#9055

NateBrady23 pinned this issue Sep 14, 2023

joanhey mentioned this issue Oct 16, 2023

Update aspnet core benchmarks #8498

Merged

NateBrady23 mentioned this issue Dec 11, 2023

[Python] Add FastAPI-Granian tests #8541

Open

This was referenced Mar 22, 2024

Add mrhttp #8819

Merged

[New Framework]: Oxygen.jl #8789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No longer accepting plaintext only frameworks / Limited number of tests mutations #8420

No longer accepting plaintext only frameworks / Limited number of tests mutations #8420

NateBrady23 commented Sep 14, 2023

fakeshadow commented Sep 14, 2023

gi0baro commented Sep 15, 2023

joanhey commented Sep 17, 2023

otrosien commented Oct 26, 2023 •

edited

joanhey commented Oct 26, 2023 •

edited

fakeshadow commented Oct 26, 2023

joanhey commented Oct 26, 2023

fakeshadow commented Oct 27, 2023 •

edited

billywhizz commented Dec 8, 2023

fakeshadow commented Dec 8, 2023 •

edited

MarkReedZ commented Mar 26, 2024

remittor commented May 24, 2024

No longer accepting plaintext only frameworks / Limited number of tests mutations #8420

No longer accepting plaintext only frameworks / Limited number of tests mutations #8420

Comments

NateBrady23 commented Sep 14, 2023

fakeshadow commented Sep 14, 2023

gi0baro commented Sep 15, 2023

joanhey commented Sep 17, 2023

otrosien commented Oct 26, 2023 • edited

joanhey commented Oct 26, 2023 • edited

fakeshadow commented Oct 26, 2023

joanhey commented Oct 26, 2023

fakeshadow commented Oct 27, 2023 • edited

billywhizz commented Dec 8, 2023

fakeshadow commented Dec 8, 2023 • edited

MarkReedZ commented Mar 26, 2024

remittor commented May 24, 2024

otrosien commented Oct 26, 2023 •

edited

joanhey commented Oct 26, 2023 •

edited

fakeshadow commented Oct 27, 2023 •

edited

fakeshadow commented Dec 8, 2023 •

edited