Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] plaintext benchmark #9

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

[WIP] plaintext benchmark #9

wants to merge 13 commits into from

Conversation

jangko
Copy link
Contributor

@jangko jangko commented Aug 22, 2018

  • finishing bench bot implementation
  • add participant: rust actix
  • add participant: go-lang fasthttp
  • add participant: c libreactor
  • polishing benchmark report
  • completing thread/nothread stuff
  • rewrite response section code
  • add nodocker script

@cheatfate
Copy link
Collaborator

I'm sorry @jangko, there no reason to make benchmarks on multi-threaded vs single-threaded apps. Could you please try to limit number of processes/threads used by chosen framework?

@jangko
Copy link
Contributor Author

jangko commented Aug 28, 2018

Could you please try to limit number of processes/threads used by chosen framework?

sure, i will try to make the benchmark as fair as possible for each participant

@cheatfate
Copy link
Collaborator

cheatfate commented Aug 28, 2018

From my tests on VM with just only 2 processors available, mofuw is not so performant, and also it produces more errors, not successful responses:

Running 10s test @ http://127.0.0.1:34500
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.02ms    4.57ms  61.20ms   97.04%
    Req/Sec    17.78k     6.53k   32.87k    59.20%
  355475 requests in 10.10s, 44.75MB read
  Non-2xx or 3xx responses: 355475
Requests/sec:  35197.01
Transfer/sec:      4.43MB
./wrk http://127.0.0.1:34500  1.02s user 4.18s system 51% cpu 10.105 total
cheatfate@phantom ~/wrk (git)-[master] % ./wrk http://127.0.0.1:34500
Running 10s test @ http://127.0.0.1:34500
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.12ms   14.53ms 151.12ms   95.69%
    Req/Sec    17.17k     2.89k   20.92k    82.67%
  345100 requests in 10.10s, 43.44MB read
  Non-2xx or 3xx responses: 345100
Requests/sec:  34168.76
Transfer/sec:      4.30MB
./wrk http://127.0.0.1:34500  1.13s user 4.13s system 51% cpu 10.110 total
cheatfate@phantom ~/wrk (git)-[master] % ./wrk http://127.0.0.1:34500
Running 10s test @ http://127.0.0.1:34500
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.55ms   22.19ms 220.08ms   95.90%
    Req/Sec    16.85k     2.76k   31.98k    82.59%
  337011 requests in 10.10s, 42.42MB read
  Non-2xx or 3xx responses: 337011
Requests/sec:  33366.97
Transfer/sec:      4.20MB
./wrk http://127.0.0.1:34500  1.23s user 3.78s system 49% cpu 10.145 total

@cheatfate
Copy link
Collaborator

cheatfate commented Aug 28, 2018

While on the same VM asyncdispatch2 benchmark produces such output:

cheatfate@phantom ~/wrk (git)-[master] % ./wrk http://127.0.0.1:8885
Running 10s test @ http://127.0.0.1:8885
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   208.75us  184.01us  10.27ms   98.49%
    Req/Sec    24.09k     3.90k   28.10k    77.23%
  484088 requests in 10.10s, 24.93MB read
Requests/sec:  47929.24
Transfer/sec:      2.47MB
./wrk http://127.0.0.1:8885  1.06s user 5.18s system 61% cpu 10.104 total
cheatfate@phantom ~/wrk (git)-[master] % ./wrk http://127.0.0.1:8885
Running 10s test @ http://127.0.0.1:8885
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   182.87us  107.18us   6.93ms   97.22%
    Req/Sec    26.52k     1.07k   29.06k    59.00%
  527788 requests in 10.01s, 27.18MB read
Requests/sec:  52746.72
Transfer/sec:      2.72MB
./wrk http://127.0.0.1:8885  1.48s user 5.84s system 73% cpu 10.009 total
cheatfate@phantom ~/wrk (git)-[master] % ./wrk http://127.0.0.1:8885
Running 10s test @ http://127.0.0.1:8885
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   218.96us  240.94us  10.37ms   98.65%
    Req/Sec    23.38k     4.37k   28.07k    70.79%
  469746 requests in 10.10s, 24.19MB read
Requests/sec:  46510.70
Transfer/sec:      2.40MB
./wrk http://127.0.0.1:8885  1.05s user 4.96s system 59% cpu 10.103 total

As you can see here, there no Non-2xx or 3xx responses: 337011. So normal HTTP answers got received by wrk.

@jangko
Copy link
Contributor Author

jangko commented Aug 29, 2018

mofuw need /plaintext uri to avoid Non-2xx or 3xx responses

The performance difference between your benchmark and mine is because I use pipeline switch turn on.
When the pipeline switch added to wrk, mofuw performance will be higher than ad2.

@cheatfate
Copy link
Collaborator

@jangko, from what i see asyncdispatch and asyncdispatch2 benchmarks - do not support pipeline messages. So why you are testing it?

@jangko
Copy link
Contributor Author

jangko commented Aug 29, 2018

Most of performant techempower benchmark participant are designed to handle pipeline messages. On the other hand, this benchmark does not take it into account.
While testing those frameworks, I realize their performance can vary significantly with/without pipeline mode. I think it would be important to keep this information.
The final result of this benchmark will include both pipeline and no-pipeline mode for comparison, or it will become bench-bot switchable feature.
What we can do now is make them all run in single thread mode. Then we can decide what we will need to do with this pipeline.

@cheatfate
Copy link
Collaborator

But you can adjust benchmark source to support pipeline for both asyncdispatch and asyncdispatch2.

@jangko
Copy link
Contributor Author

jangko commented Aug 29, 2018

But you can adjust benchmark source to support pipeline for both asyncdispatch and asyncdispatch2.

agree

@jangko
Copy link
Contributor Author

jangko commented Sep 3, 2018

ad2 is fast but then suffer massive slowdown, hmm. interesting

.travis.yml Outdated
@@ -37,3 +37,5 @@ install:
script:
- nimble install -y
- nimble test
- nimble benchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of running benchmark on CI setup? neither travis nor appveyor provide stable hardware so numbers will be meaningless, and benchmarks tend to take a while so it will slow down every PR / build roundtrip..

@jangko
Copy link
Contributor Author

jangko commented Sep 11, 2018

and benchmarks tend to take a while so it will slow down every PR / build roundtrip..

That's right, it took significant amount of time. I already removed it from CI.


Summary

  • mofuw, mofuw use asyncdispatch, expected performance should not more than asynchdispatch itself.
  • asyncdispatch, although it is slower than asyncdispatch2, it can handle high concurrency quite well.
  • asyncdispatch2, at high concurrency it has tendency become slower significantly,
    but surpringsingly it is the only framework in this test that can handle non pipeline request faster than other frameworks although using almost identical code with asyncdispatch when handle request/response.
  • actix-raw, very fast when multi threaded, not so when single threaded.
  • fasthttp, very fast when multi threaded, not so when single threaded.
  • libreactor, still very fast although in single thread mode.

Conclusion

  • asyncdispatch2 could be a good candidate to replace asycndispatch
  • it still has room for improvement especially when handle high count connections.

Sorry I cannot work faster because of some circumstances, but I think this one is ready for review.

@dm1try
Copy link

dm1try commented Sep 15, 2018

looks like asyncdispatch2 benchmark has broken implementation at least on Mac OS: it generates ~10x responses for the same request(provided results have a similar correlation)

wrk is going crazy in that way:

wrk -c 30 -d 15s -t 4 http://localhost:8080/
Running 15s test @ http://localhost:8080/
  4 threads and 30 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   204.63us  122.29us   3.45ms   66.65%
    Req/Sec   329.58k    22.97k  376.98k    63.91%
  19802436 requests in 15.10s, 2.56GB read
Requests/sec: 1311431.24
Transfer/sec:    173.84MB

The test programs are supposed to check for the 'USE_THREADS'
environment variable to decide wether threads should be used.
So far, this has been implemented in the Go and Rust benchmarks.

The bot will set this variable by default.

Other changes:

* The Rust implementation has been updated with the latest
  code from the TechEmperor benchmark and the incremental
  Docker build process has been greatly improved
@zah
Copy link
Member

zah commented Sep 21, 2018

@jangko, I've pushed to your branch a commit adding a command-line option for deciding whether threads should be used. To support it, the test programs need a minor modification - they must check whether the environment variable USE_THREADS is set. You can see an example here:

4fa3b6e#diff-a700604e55a2b00d28959045bdda5b09R26

I've added this to the Rust and Go programs, but can we also add it to the rest of the examples?

The asyncdispatch2 program that you've prepared is violating the rules of the competition, which are given here:
https://www.techempower.com/benchmarks/#section=code

In particular, this rule:

This test is not intended to exercise the allocation of memory or instantiation of objects. Therefore it is acceptable but not required to re-use a single buffer for the response text (Hello, World). However, the response must be fully composed from the response text and response headers within the scope of each request and it is not acceptable to store the entire payload of the response, or an unnaturally large subset of the response, headers inclusive, as a pre-rendered buffer. "Buffer" here refers to a byte array, byte buffer, character array, character buffer, string, or string-like data structure. The spirit of the test is to require the construction of the HTTP response as is typically done by a framework or platform via concatenation of strings or similar. For example, pre-rendering a buffer with HTTP/1.1 200 OKContent-length: 15Server: Example would not be acceptable.

So, you must break up a bit the strings being written as a response. I think you can avoid some of the allocations and concatenations as well, @cheatfate may provide some hints for what is the most efficient way to build the response piece by piece.

bung87 pushed a commit to bung87/nim-chronos that referenced this pull request Nov 17, 2020
Add tests for status-im#9.
Temporary disable some tests in testaddress.nim.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants