Skip to content
This repository has been archived by the owner on Nov 27, 2022. It is now read-only.

Heavy compute does not give a good comparison of parallel iter #28

Open
ElliotB256 opened this issue Jan 20, 2022 · 2 comments
Open

Heavy compute does not give a good comparison of parallel iter #28

ElliotB256 opened this issue Jan 20, 2022 · 2 comments

Comments

@ElliotB256
Copy link

Hi,

I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be heavy_compute, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).

I propose either:

  • reducing the task in the parallel loop of heavy_compute (e.g., to inverting the matrix once, or multiplying a float value, something very small)
  • Or introducing a new parallel_light_compute benchmark.

An example of option two is here: https://github.com/ElliotB256/ecs_bench_suite/tree/parallel_light_compute
Further discussion can be found here: bevyengine/bevy#2173

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

@Systemcluster
Copy link

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

In my results (where I merged your and other forks and updated all libraries among some adjustments) bevy is only 2x slower than specs in parallel_light_compute and actually faster than the other libraries. It might be sensitive on thread count as well (I ran it on a 16c/32t system), or the situation improved drastically between bevy 0.5 and 0.6.

@ElliotB256
Copy link
Author

Thanks for looking!

However, a note: bevy is extremely sensitive to batch size, while other libraries don't need a batch size to be set. Your file shows a batch size set to 1024. In the discussion I posted above, you'll find the following table which shows bevy scaling with batch size:

Batch Size Time
8 1.177ms
64 234.13us
256 149.48us
1024 130.48us
4096 207.13us
10,000 485.55us

On my pc, 1024 was the optimum batch size for bevy. For comparison, specs was 108.00 us, so bevy was about ~2x slower than specs. However, in the worst case scenario of unoptimised batch size, bevy remains >10x slower (hence my numbers in first post). I expect the 'ideal' batch size is both hardware and System dependent, and the optimum will be rarely achieved.

(Disclaimer: my tests are still for bevy 0.5 and I didn't get time to run comparisons for 0.6 yet! but my understanding is the parallel performance did not change from other discussions).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants