Heavy compute does not give a good comparison of parallel iter #28

ElliotB256 · 2022-01-20T20:40:41Z

Hi,

I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be heavy_compute, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).

I propose either:

reducing the task in the parallel loop of heavy_compute (e.g., to inverting the matrix once, or multiplying a float value, something very small)
Or introducing a new parallel_light_compute benchmark.

An example of option two is here: https://github.com/ElliotB256/ecs_bench_suite/tree/parallel_light_compute
Further discussion can be found here: bevyengine/bevy#2173

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

The text was updated successfully, but these errors were encountered:

Systemcluster · 2022-01-24T12:10:32Z

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

In my results (where I merged your and other forks and updated all libraries among some adjustments) bevy is only 2x slower than specs in parallel_light_compute and actually faster than the other libraries. It might be sensitive on thread count as well (I ran it on a 16c/32t system), or the situation improved drastically between bevy 0.5 and 0.6.

ElliotB256 · 2022-01-24T13:13:13Z

Thanks for looking!

However, a note: bevy is extremely sensitive to batch size, while other libraries don't need a batch size to be set. Your file shows a batch size set to 1024. In the discussion I posted above, you'll find the following table which shows bevy scaling with batch size:

Batch Size	Time
8	1.177ms
64	234.13us
256	149.48us
1024	130.48us
4096	207.13us
10,000	485.55us

On my pc, 1024 was the optimum batch size for bevy. For comparison, specs was 108.00 us, so bevy was about ~2x slower than specs. However, in the worst case scenario of unoptimised batch size, bevy remains >10x slower (hence my numbers in first post). I expect the 'ideal' batch size is both hardware and System dependent, and the optimum will be rarely achieved.

(Disclaimer: my tests are still for bevy 0.5 and I didn't get time to run comparisons for 0.6 yet! but my understanding is the parallel performance did not change from other discussions).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heavy compute does not give a good comparison of parallel iter #28

Heavy compute does not give a good comparison of parallel iter #28

ElliotB256 commented Jan 20, 2022

Systemcluster commented Jan 24, 2022

ElliotB256 commented Jan 24, 2022

Heavy compute does not give a good comparison of parallel iter #28

Heavy compute does not give a good comparison of parallel iter #28

Comments

ElliotB256 commented Jan 20, 2022

Systemcluster commented Jan 24, 2022

ElliotB256 commented Jan 24, 2022