Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reducing variance doc incorrectly recommends disabling hyper threading AND pinning CPU frequency scaling? #1686

Open
simonhf opened this issue Oct 25, 2023 · 0 comments

Comments

@simonhf
Copy link

simonhf commented Oct 25, 2023

Describe the bug

The "reducing variance" doc [1] recommends disabling hyper threading AND pinning CPU frequency scaling.

However, I tried doing this 6 different ways and discovered that pinning CPU frequency scaling appears to work incorrectly if hyper threads are disabled.

During my benchmark I monitored the CPU frequency of each enabled CPU and logged it each second. I could then analyze the log and determine that only the benchmarks with hyper threading enabled appeared to properly respect pinning CPU frequency scaling. Presumably this is some kind of kernel bug? Has anybody else come across this? And is there a work around?

So I ran the benchmark 6 times at ~ 300 seconds, and each time slightly changed the hyper thread / CPU frequency scaling config.
During the benchmark then each second I logged the MHz of each of the 12 or 24 enabled CPUs running the benchmark.

Run 1: hyper threads off and cpufreq governor set to "powersave":

- CPU  0: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2168=p100  594961=tmhz
- CPU  1: 298 MHz samples: 2953=p0 3493=p25 3493=p50 3493=p75 3494=p100 1034005=tmhz
- CPU  2: 298 MHz samples: 2953=p0 3493=p25 3493=p50 3493=p75 3494=p100 1034007=tmhz
- CPU  3: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2168=p100  594931=tmhz
- CPU  4: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2200=p100  595075=tmhz
- CPU  5: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2169=p100  594877=tmhz
- CPU  6: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2200=p100  595101=tmhz
- CPU  7: 298 MHz samples: 2200=p0 3491=p25 3493=p50 3493=p75 3503=p100 1032578=tmhz
- CPU  8: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2190=p100  594934=tmhz
- CPU  9: 298 MHz samples: 2200=p0 3493=p25 3493=p50 3493=p75 3493=p100 1032769=tmhz
- CPU 10: 298 MHz samples: 2200=p0 3492=p25 3493=p50 3493=p75 3508=p100 1012810=tmhz
- CPU 11: 298 MHz samples: 1885=p0 1995=p25 1995=p50 1995=p75 2193=p100  594576=tmhz

Run 2: hyper threads off and cpufreq governor set to default "ondemand" with max frequency set to same as min

- CPU  0: 298 MHz samples: 1887=p0 1996=p25 1996=p50 1996=p75 2193=p100  596252=tmhz
- CPU  1: 298 MHz samples: 2668=p0 3436=p25 3493=p50 3493=p75 3493=p100 1020598=tmhz
- CPU  2: 298 MHz samples: 2669=p0 3436=p25 3493=p50 3493=p75 3493=p100 1020475=tmhz
- CPU  3: 298 MHz samples: 2669=p0 3438=p25 3493=p50 3493=p75 3493=p100 1020529=tmhz
- CPU  4: 298 MHz samples: 2665=p0 3435=p25 3493=p50 3493=p75 3493=p100 1020375=tmhz
- CPU  5: 298 MHz samples: 2269=p0 2748=p25 2794=p50 2794=p75 2794=p100  816871=tmhz
- CPU  6: 298 MHz samples: 2668=p0 3434=p25 3493=p50 3493=p75 3493=p100 1020284=tmhz
- CPU  7: 298 MHz samples: 1890=p0 1993=p25 1996=p50 1996=p75 2193=p100  596019=tmhz
- CPU  8: 298 MHz samples: 1890=p0 1996=p25 1996=p50 1996=p75 2193=p100  596070=tmhz
- CPU  9: 298 MHz samples: 1890=p0 1996=p25 1996=p50 1996=p75 2194=p100  596180=tmhz
- CPU 10: 298 MHz samples: 2200=p0 3380=p25 3493=p50 3493=p75 3501=p100 1003906=tmhz
- CPU 11: 298 MHz samples: 2680=p0 3441=p25 3492=p50 3493=p75 3495=p100 1020446=tmhz

Run 3: hyper threads off and cpufreq governor set to "performance"

- CPU  0: 299 MHz samples: 2399=p0 2954=p25 3270=p50 3492=p75 3493=p100  956729=tmhz
- CPU  1: 299 MHz samples: 1993=p0 2931=p25 3232=p50 3459=p75 3493=p100  948039=tmhz
- CPU  2: 299 MHz samples: 2396=p0 2958=p25 3278=p50 3491=p75 3493=p100  957128=tmhz
- CPU  3: 299 MHz samples: 2398=p0 2928=p25 3220=p50 3456=p75 3493=p100  946117=tmhz
- CPU  4: 299 MHz samples: 2396=p0 2931=p25 3224=p50 3458=p75 3493=p100  948164=tmhz
- CPU  5: 299 MHz samples: 2397=p0 2932=p25 3229=p50 3458=p75 3493=p100  948381=tmhz
- CPU  6: 299 MHz samples: 1890=p0 2921=p25 3218=p50 3448=p75 3493=p100  943951=tmhz
- CPU  7: 299 MHz samples: 1862=p0 2931=p25 3246=p50 3462=p75 3493=p100  948668=tmhz
- CPU  8: 299 MHz samples: 2400=p0 2951=p25 3278=p50 3493=p75 3493=p100  957320=tmhz
- CPU  9: 299 MHz samples: 2041=p0 2925=p25 3240=p50 3470=p75 3493=p100  947504=tmhz
- CPU 10: 299 MHz samples: 1889=p0 2344=p25 2716=p50 2911=p75 3493=p100  796872=tmhz
- CPU 11: 299 MHz samples: 1828=p0 1962=p25 1995=p50 2059=p75 2662=p100  601104=tmhz

Run 4: hyper threads on and cpufreq governor set to "powersave":

- CPU  0: 298 MHz samples: 1741=p0 2195=p25 2195=p50 2195=p75 2200=p100  650545=tmhz
- CPU  1: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2200=p100  650462=tmhz
- CPU  2: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2197=p100  650298=tmhz
- CPU  3: 298 MHz samples: 1742=p0 2195=p25 2195=p50 2195=p75 2200=p100  650281=tmhz
- CPU  4: 298 MHz samples: 1721=p0 2195=p25 2195=p50 2195=p75 2197=p100  650153=tmhz
- CPU  5: 298 MHz samples: 1740=p0 2195=p25 2195=p50 2195=p75 2196=p100  650266=tmhz
- CPU  6: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2196=p100  650285=tmhz
- CPU  7: 298 MHz samples: 1743=p0 2195=p25 2195=p50 2195=p75 2200=p100  650227=tmhz
- CPU  8: 298 MHz samples: 1743=p0 2195=p25 2195=p50 2195=p75 2200=p100  650099=tmhz
- CPU  9: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2200=p100  650111=tmhz
- CPU 10: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2200=p100  650115=tmhz
- CPU 11: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2199=p100  650025=tmhz
- CPU 12: 298 MHz samples: 1741=p0 2195=p25 2195=p50 2195=p75 2200=p100  650127=tmhz
- CPU 13: 298 MHz samples: 1747=p0 2195=p25 2195=p50 2195=p75 2200=p100  650162=tmhz
- CPU 14: 298 MHz samples: 1746=p0 2195=p25 2195=p50 2195=p75 2196=p100  650177=tmhz
- CPU 15: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2196=p100  650088=tmhz
- CPU 16: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2200=p100  650450=tmhz
- CPU 17: 298 MHz samples: 1746=p0 2195=p25 2195=p50 2195=p75 2195=p100  650452=tmhz
- CPU 18: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2198=p100  650423=tmhz
- CPU 19: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2195=p100  650484=tmhz
- CPU 20: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100  650393=tmhz
- CPU 21: 298 MHz samples: 1949=p0 2200=p25 2200=p50 2200=p75 2200=p100  654747=tmhz
- CPU 22: 298 MHz samples: 1745=p0 2194=p25 2194=p50 2195=p75 2196=p100  650316=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100  655600=tmhz

Run 5: hyper threads on and cpufreq governor set to default "ondemand" with max frequency set to same as min

- CPU  0: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649668=tmhz
- CPU  1: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649542=tmhz
- CPU  2: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2196=p100  649509=tmhz
- CPU  3: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2196=p100  649479=tmhz
- CPU  4: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100  649514=tmhz
- CPU  5: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649467=tmhz
- CPU  6: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649378=tmhz
- CPU  7: 298 MHz samples: 1755=p0 2195=p25 2195=p50 2195=p75 2200=p100  649501=tmhz
- CPU  8: 298 MHz samples: 1754=p0 2195=p25 2195=p50 2195=p75 2200=p100  649393=tmhz
- CPU  9: 298 MHz samples: 1754=p0 2195=p25 2195=p50 2195=p75 2200=p100  649397=tmhz
- CPU 10: 298 MHz samples: 1759=p0 2195=p25 2195=p50 2195=p75 2195=p100  649485=tmhz
- CPU 11: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649488=tmhz
- CPU 12: 298 MHz samples: 1758=p0 2195=p25 2195=p50 2195=p75 2200=p100  649312=tmhz
- CPU 13: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100  649444=tmhz
- CPU 14: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649390=tmhz
- CPU 15: 298 MHz samples: 1751=p0 2195=p25 2195=p50 2195=p75 2200=p100  649347=tmhz
- CPU 16: 298 MHz samples: 1757=p0 2195=p25 2195=p50 2195=p75 2200=p100  649640=tmhz
- CPU 17: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100  649491=tmhz
- CPU 18: 298 MHz samples: 1757=p0 2195=p25 2195=p50 2195=p75 2200=p100  649627=tmhz
- CPU 19: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2197=p100  649535=tmhz
- CPU 20: 298 MHz samples: 1719=p0 2195=p25 2195=p50 2195=p75 2203=p100  648801=tmhz
- CPU 21: 298 MHz samples: 1878=p0 2200=p25 2200=p50 2200=p75 2200=p100  653780=tmhz
- CPU 22: 298 MHz samples: 1749=p0 2194=p25 2195=p50 2195=p75 2196=p100  649486=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100  655600=tmhz

Run 6: hyper threads on and cpufreq governor set to "performance"

- CPU  0: 298 MHz samples: 1652=p0 2390=p25 2883=p50 3352=p75 3493=p100  850426=tmhz
- CPU  1: 298 MHz samples: 1652=p0 2380=p25 2888=p50 3369=p75 3493=p100  849895=tmhz
- CPU  2: 298 MHz samples: 1652=p0 2376=p25 2775=p50 3083=p75 3493=p100  815483=tmhz
- CPU  3: 298 MHz samples: 1652=p0 2377=p25 2789=p50 3073=p75 3493=p100  815044=tmhz
- CPU  4: 298 MHz samples: 1686=p0 2364=p25 2791=p50 3052=p75 3493=p100  815013=tmhz
- CPU  5: 298 MHz samples: 1652=p0 2379=p25 2790=p50 3053=p75 3493=p100  816213=tmhz
- CPU  6: 298 MHz samples: 1652=p0 2374=p25 2769=p50 3053=p75 3493=p100  813056=tmhz
- CPU  7: 298 MHz samples: 1653=p0 2383=p25 2764=p50 3045=p75 3493=p100  812936=tmhz
- CPU  8: 298 MHz samples: 1652=p0 2376=p25 2762=p50 3013=p75 3493=p100  809194=tmhz
- CPU  9: 298 MHz samples: 1652=p0 2374=p25 2756=p50 3019=p75 3493=p100  808762=tmhz
- CPU 10: 298 MHz samples: 1653=p0 2377=p25 2768=p50 3072=p75 3493=p100  814275=tmhz
- CPU 11: 298 MHz samples: 1653=p0 2384=p25 2762=p50 3062=p75 3493=p100  814070=tmhz
- CPU 12: 298 MHz samples: 1652=p0 2379=p25 2776=p50 3080=p75 3493=p100  815921=tmhz
- CPU 13: 298 MHz samples: 1653=p0 2374=p25 2769=p50 3076=p75 3493=p100  816261=tmhz
- CPU 14: 298 MHz samples: 1652=p0 2377=p25 2780=p50 3066=p75 3493=p100  813424=tmhz
- CPU 15: 298 MHz samples: 1653=p0 2372=p25 2774=p50 3069=p75 3493=p100  813671=tmhz
- CPU 16: 298 MHz samples: 1627=p0 2367=p25 2761=p50 2953=p75 3493=p100  803208=tmhz
- CPU 17: 298 MHz samples: 1627=p0 2376=p25 2764=p50 2953=p75 3493=p100  803740=tmhz
- CPU 18: 298 MHz samples: 1627=p0 2381=p25 2789=p50 2988=p75 3493=p100  809744=tmhz
- CPU 19: 298 MHz samples: 1627=p0 2383=p25 2780=p50 2995=p75 3493=p100  809672=tmhz
- CPU 20: 298 MHz samples: 1627=p0 2006=p25 2200=p50 2550=p75 3500=p100  688864=tmhz
- CPU 21: 298 MHz samples: 1864=p0 2200=p25 2200=p50 2200=p75 2754=p100  657122=tmhz
- CPU 22: 298 MHz samples: 1627=p0 1915=p25 1984=p50 2040=p75 2426=p100  589621=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100  655600=tmhz

Note: The p0 thru p100 is the percentile MHz from the 298 samples taken each second during the benchmarks.

Note: tmhz is the total of all MHz samples. So we would expect the totals to be the same if there is less variance, or?

Why set the governor to "ondemand" with the max frequency set to same as min?
Is that not similar to "powersave" governor? It seems not, from the results.

System

Ubuntu 22.04 LTS

To reproduce

This is not actually using the benchmark repo project, but rather just testing out the recommondations to reduce variance at [1].

Expected behavior

I would expect all the first 3 benchmarks to be the most accurate because hyper threads are disabled.

I would expected all benchmarks to have a similar total MHz for each CPU used in the benchmark.

However, in reality only benchmarks 4 and 5 appear to offer reduced variance RE CPU MHz.

It would be great if others could comment on there experiences with benchmarks and CPU frequency scaling.
Does anybody else actively monitor the CPU frequencies to sanity that the governor is doing its job?
Maybe the "reducing variance" doc [1] could be updated to warn not to take the governor for granted?

Screenshots

n/a

Additional context

The CPU in this case is an AMD Ryzen. Note: Boost mode was also disabled.

[1] https://github.com/google/benchmark/blob/main/docs/reducing_variance.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant