Fix some issues with dynamic algorithm selection in coll/tuned #8186

devreal · 2020-11-05T17:42:30Z

This PR addresses a potential performance issue with the algorithm selection in coll/tuned and some minor issues found while digging into it:

Performance: some bcast and allreduce algorithms require the number of elements to be larger than the number of ranks and will fall back to a linear implementation if that is not the case. This hit me bitterly with 4B bcast on 128 ranks on a single node where latency increased 10x compared to 127 ranks.
The documentation for the coll_tuned_*_algorithm MCA variables should mention that they only take effect if the coll_tuned_use_dynamic_rules variable is set to true.
Mark global static array used for the MCA parameters as const.
Fix some glitches in comments.

The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

…d fall back to linear Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs Allreduce: rabenseifner expects N_elem >= pow2 nearest to N_procs In all cases, the implementations will fall back to a linear implementation, which will most likely yield the worst performance (noted for 4B bcast on 128 ranks) Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

wckzhang

Looks fine, since most of the tuning data was collected on power of 2 procs, should have considered the non pow-2 fallbacks and done slightly more finer grained tuning. This likely applies to other collectives.

rajachan · 2020-11-10T14:56:33Z

Are we going to need this for 4.1.x? It does seem to fix a serious performance regression.

devreal · 2020-11-10T16:05:38Z

Yes, I will backport that to 4.1.x later today.

rajachan · 2020-11-10T17:02:13Z

@devreal are these the only collectives you saw regressions with? I saw a similar issue with Allgatherv in the 4.1.x branch. Will redo my tests this afternoon to verify.

msg_size (B)	latency (usecs)
1	434.62
2	431.92
4	10256.13
8	10139.07
16	10212.06
32	10087.31
64	9979.71
128	10036.12
256	10285.58
512	10510.48
1024	10599.45
2048	12227.23
4096	15412.86
8192	29796.99
16384	34672.19
32768	44542.43
65536	66748.59
131072	120687.02
262144	221731.62

…lgather These selections seem harmful in my measurements and don't seem to be motivated by previous measurement data. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

devreal · 2020-11-12T19:14:03Z

@rajachan There is indeed a problem with allgatherv: I believe decisions were generated based on the output of the OSU benchmark, which reports the number of bytes sent by each process. However, the decision logic uses he number of bytes to be received by each process. I'm working on a quick fix based on my measurements. Unfortunately, the v4.1.x backport of this PR is already merged so I will create new PRs for master and v4.1.x.

rajachan · 2020-11-12T19:20:35Z

Great, thanks!

rajachan · 2020-11-12T21:47:10Z

I believe decisions were generated based on the output of the OSU benchmark, which reports the number of bytes sent by each process. However, the decision logic uses he number of bytes to be received by each process.

We should address this in the collectives tuning scripts so we don't run into this again the next time we tune the defaults (although it is not clear to me right now how we would account for this). Perhaps an issue against https://github.com/open-mpi/ompi-collectives-tuning/ is in order.

devreal · 2020-11-12T21:57:19Z

Done: open-mpi/ompi-collectives-tuning#24

devreal added 4 commits November 5, 2020 18:20

coll/tuned: add hint about dynamic rules to mca parameters

06f605c

The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

coll/tuned: Mark global static algorithm as const

7261255

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

coll/tuned: fix minor errors in comments

22e289b

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

devreal added the Target: main label Nov 5, 2020

devreal requested review from bosilca and wckzhang November 5, 2020 17:42

bosilca approved these changes Nov 5, 2020

View reviewed changes

wckzhang approved these changes Nov 10, 2020

View reviewed changes

devreal mentioned this pull request Nov 10, 2020

Fix some issues with dynamic algorithm selection in coll/tuned #8198

Merged

COLL TUNED: remove stray selection of linear algs for alreduce and al…

a15e5dc

…lgather These selections seem harmful in my measurements and don't seem to be motivated by previous measurement data. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

rajachan merged commit 30831fb into open-mpi:master Nov 12, 2020

nariba-fj mentioned this pull request May 19, 2021

ompi/mca/coll/tuned: Fix typo #8979

Merged

devreal deleted the fix-tuned-dynamic branch October 3, 2022 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some issues with dynamic algorithm selection in coll/tuned #8186

Fix some issues with dynamic algorithm selection in coll/tuned #8186

devreal commented Nov 5, 2020

wckzhang left a comment

rajachan commented Nov 10, 2020

devreal commented Nov 10, 2020

rajachan commented Nov 10, 2020 •

edited

devreal commented Nov 12, 2020

rajachan commented Nov 12, 2020

rajachan commented Nov 12, 2020

devreal commented Nov 12, 2020

Fix some issues with dynamic algorithm selection in coll/tuned #8186

Fix some issues with dynamic algorithm selection in coll/tuned #8186

Conversation

devreal commented Nov 5, 2020

wckzhang left a comment

Choose a reason for hiding this comment

rajachan commented Nov 10, 2020

devreal commented Nov 10, 2020

rajachan commented Nov 10, 2020 • edited

devreal commented Nov 12, 2020

rajachan commented Nov 12, 2020

rajachan commented Nov 12, 2020

devreal commented Nov 12, 2020

rajachan commented Nov 10, 2020 •

edited