-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected change in throughput running z_sub_thr and restarting z_pub_thr #1017
Comments
Here is another case where the first run of z_pub_thr results in a throughput of ~2800 msg/s according to z_sub_thr, but multiple restarts of z_pub_thr afterwards always have much higher bandwidth of ~5000 msg/s according to z_sub_thr. |
Hi @jackg0! The instability possibly comes from the system itself. Could you please try the following
|
Hi @YuanYuYuan, thanks for the quick response and the suggestions! I've attached two videos to help debug. One with both the I also ran https://github.com/eclipse-zenoh/zenoh/assets/8713234/b10202d2-a5aa-4fdf-82cc-fc704e9168aa |
Hi @jackg0, I think you could increase the number of messages in each measurement to eliminate the variance. This is what I observed on my laptop. sudo nice -n -20 taskset -c 0,2 ./target/release/examples/z_sub_thr -s 5 -n 5000
sudo nice -n -20 taskset -c 1,3 ./target/release/examples/z_pub_thr 1048576 Repeat 10 times. Press CTRL-C to quit...
2186.625858047293 msg/s
1976.6553852170493 msg/s
2159.7951844801387 msg/s
2060.118765706755 msg/s
2322.7678741585833 msg/s
Press CTRL-C to quit...
2017.45538633758 msg/s
2048.95934592392 msg/s
2367.598719428168 msg/s
2013.1891291489683 msg/s
2492.418014551374 msg/s
Press CTRL-C to quit...
1945.737483207945 msg/s
1840.192912987812 msg/s
1962.4937566106748 msg/s
1852.0520909223058 msg/s
1989.227550648963 msg/s
Press CTRL-C to quit...
2044.7209231016272 msg/s
2075.8386407413896 msg/s
2078.8020593474657 msg/s
2058.1868709512523 msg/s
2078.3503005702096 msg/s
Press CTRL-C to quit...
2041.9730133957385 msg/s
2062.873456989323 msg/s
2146.1786394368332 msg/s
2036.5213537008829 msg/s
2190.538801541054 msg/s
Press CTRL-C to quit...
2037.434580970634 msg/s
2043.238797955243 msg/s
2040.4466059339704 msg/s
2055.1036757058237 msg/s
2042.4498654281867 msg/s
Press CTRL-C to quit...
2053.5388791367764 msg/s
2083.8771818636915 msg/s
2053.47317582037 msg/s
2066.2914473976557 msg/s
2095.531008961677 msg/s
Press CTRL-C to quit...
2810.2802030513103 msg/s
2064.075646021032 msg/s
2077.619939098062 msg/s
2073.4854773456614 msg/s
2062.563490860657 msg/s
Press CTRL-C to quit...
2447.5786489674642 msg/s
2207.575150459812 msg/s
2038.9994240784958 msg/s
2062.0763335292995 msg/s
2078.794691355887 msg/s
Press CTRL-C to quit...
2114.984155824615 msg/s
2059.924117907853 msg/s
2035.1727948625066 msg/s
2064.2951988374552 msg/s
2028.676008906326 msg/s To me the variance is not so high. |
Hi @YuanYuYuan, I tried setting a much higher sample size of 10k and I also used I also only see the issue when I have a subscriber that doesn't exit after a few samples. In your case, it seems the subscriber is exiting after 5 samples with |
We also observed the same thing before until we realized the CPU affinity is necessary to set. And I had another one long-running result. sudo nice -n -20 taskset -c 1,3 ./target/release/examples/z_pub_thr 1048576
sudo nice -n -20 taskset -c 0,2 ./target/release/examples/z_sub_thr -s 100 -n 5000 # repeat five times The variance seems within an acceptable range. |
Ok, thank you. I'll keep looking at it on my system. I am assuming the variance is a system issue. Do you recommend that we always set the cpu affinity when using zenoh? Is that necessary for each subscriber and publisher? Thanks for the help! |
It depends on your use case. The less CPU used, the more stable it performs. But this also sacrifices the maximal throughput. (Note that the optimal throughput might not be the case with all CPUs due to the cost of the context switch.) Do you frequently require high bandwidth in your scenario? |
Yes, we're using zenoh in high bandwidth applications right now. I'll make sure to keep this in mind as we add more publishers and subscribers. Feel free to close this issue - thanks again! |
Thanks for your information. Then it will be the topic of scalability that we are highly interested in. Don't hesitate to ping us if you have any issues. 😃 |
Describe the bug
Hi,
Thanks for all the great work on zenoh! We're excited to use this project in various applications.
I am seeing varying bandwidth when running the z_sub_thr example and restarting the z_pub_thr example with a payload size of 1 MiB.
Using the attached annotated stdout for z_sub_thr as an example, for the first run of
z_pub_thr 1048576
I see a nominal msg/s reported by z_sub_thr of 2300 msg/s. After running z_pub_thr for a 2nd time, I see a much higher bandwidth of 4600 msg/s. In restarts ofz_pub_thr
, I see 2300 msg/s.Is this an issue with the examples, my build of zenoh, or possibly some other issue?
zenoh_z_sub_thr_output.txt
To reproduce
./target/release/examples/z_sub_thr -s 100000 -n 100
./target/release/examples/z_pub_thr 1048576
System info
Platform: Docker container running Ubuntu 24.04
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
zenoh commit: b8dd01d
rustc 1.75.
The text was updated successfully, but these errors were encountered: