Add the acoll component #12484

amd-nithyavs · 2024-04-22T17:57:05Z

This PR introduces "acoll", a high-performant collective component that is optimized for communications within a single node of AMD EPYC CPUs. It mainly uses subcommunicators based on l3cache or numa to reduce cross-cache or cross-numa accesses. The supported collectives include Bcast, Allreduce, Gather, Reduce, Barrier, Allgather.

OSU micro-benchmarks were run on 2-socket AMD EPYC 9654 96-Core Processor with 4 NUMA domains per socket, with a total of 192 cores per node, on top of commit bb7ecde.
Average percentage latency reduction over "tuned" across 32, 64, 96, 128, 192 ranks over message sizes of 8 bytes to 8 MB (varied in powers of 2):

Allreduce: 37.7%
Bcast: 39.4%
Gather: 27.5%

Sample graphs:

Allreduce

Bcast

Gather

bosilca · 2024-04-22T18:04:57Z

How this compares with #10470 ?

edgargabriel · 2024-04-22T18:06:24Z

How this compares with #10470 ?

We can discuss it at the meeting. Part of the goal of filing the pr was to give people the ability to have a look at it ahead of the meeting if they want/can.

juntangc · 2024-04-26T15:25:16Z

ompi/mca/coll/acoll/Makefile.am

Do you have plan to add alltoall(v) to acoll?

Yes, we are planning to add alltoall to acoll next.

juntangc · 2024-04-26T16:13:25Z

ompi/mca/coll/acoll/coll_acoll_allgather.c

+ *              chosen, further decides if [ring|lin] allgather is to be used.
+ *
+ */
+static inline void coll_allgather_decision_fixed(int size, size_t total_dsize, int sg_size,


can you shed some lights on how to choose which methods for other intel/amd achitectures? you might also want some utility to let the user to adjust the decisions according for other systems.

Our testing has been mostly focused on Zen architectures, we will soon test on other architectures. We do not have the utility/config option to override decisions, we will plan to add it.

juntangc · 2024-04-26T16:19:57Z

ompi/mca/coll/acoll/README

@@ -0,0 +1,15 @@
+Copyright (c) 2023-2024 Advanced Micro Devices, Inc. All rights


have you thought about what needs to be done to extend this for multiple nodes?

Some of the APIs (like bcast, barrier, allgather) support multi-node case. However, it is not extensively tested for multi-node, we will test them and extend other APIs also to multi-node.

juntangc · 2024-04-26T16:28:22Z

ompi/mca/coll/acoll/coll_acoll_allgather.c

+/*
+ * rd_allgather_sub
+ *
+ * Function:    Uses recursive doubling based allgather for the group.


have you compared the performance of other methods, besides recursive doubling?

Yes, acoll/allgather chooses among recursive doubling, ring and linear based on process count and message sizes.

juntangc · 2024-04-26T16:37:12Z

ompi/mca/coll/acoll/coll_acoll_allgather.c

+    }
+
+    /* This barrier is needed to prevent random hangs */
+    err = ompi_coll_base_barrier_intra_tree(comm, module);


Why the barrier is needed here? This barrier will also add cost to small message allgather.

It is removed now.

juntangc · 2024-04-26T16:45:41Z

ompi/mca/coll/acoll/coll_acoll_allreduce.c

+        if (sbuf != MPI_IN_PLACE)
+            memcpy(tmp_rbuf, sbuf, my_count_size * dsize);
+    } else {
+        ompi_3buff_op_reduce(op, (char *) data->xpmem_saddr[0] + chunk * rank * dsize,


is the 3 operator reduce function to maintain the order?

I think this was a bit faster than copying the chunks first and then reducing later in the following "for" loop.

bosilca · 2024-04-29T20:31:27Z

Please rebase to current main to get rid of the mpi4py failure.

wenduwan · 2024-04-30T19:40:55Z

I tested the PR in AWS CI. I'm seeing assertion errors with --enable-debug

# [ pairs: 18 ] [ window size: 64 ]
# Size                  MB/s        Messages/s
osu_mbw_mr: coll_acoll_allgather.c:391: mca_coll_acoll_allgather: Assertion `subc->local_r_comm != NULL' failed.

You can try osu_mbw_mr.

hppritcha · 2024-05-01T18:44:46Z

@amd-nithyavs could you rebase this PR to see if that clears up the mpi4py CI failure?

mshanthagit · 2024-05-01T18:50:24Z

@amd-nithyavs could you rebase this PR to see if that clears up the mpi4py CI failure?

@hppritcha we did have issues after rebase. Have fixed the issues, will update the PR soon. Thanks.

mshanthagit · 2024-05-01T18:52:17Z

I tested the PR in AWS CI. I'm seeing assertion errors with --enable-debug
# [ pairs: 18 ] [ window size: 64 ]
# Size                  MB/s        Messages/s
osu_mbw_mr: coll_acoll_allgather.c:391: mca_coll_acoll_allgather: Assertion `subc->local_r_comm != NULL' failed.
You can try osu_mbw_mr.

The updated PR (yet to be pushed) will fix this issue. Thanks.

amd-nithyavs · 2024-05-08T04:49:39Z

I tested the PR in AWS CI. I'm seeing assertion errors with --enable-debug
# [ pairs: 18 ] [ window size: 64 ]
# Size                  MB/s        Messages/s
osu_mbw_mr: coll_acoll_allgather.c:391: mca_coll_acoll_allgather: Assertion `subc->local_r_comm != NULL' failed.
You can try osu_mbw_mr.

The issue is fixed in the updated PR.

amd-nithyavs · 2024-05-08T04:50:37Z

@amd-nithyavs could you rebase this PR to see if that clears up the mpi4py CI failure?

We have updated the PR, it passes the mpi4py tests.

wenduwan · 2024-05-08T13:37:52Z

Running AWS CI

wenduwan · 2024-05-08T13:39:22Z

@amd-nithyavs I noticed that the PR is currently split into 3 commits. Please squash them before merging.

wenduwan · 2024-05-08T16:50:54Z

Passed AWS CI. Note that we don't test with xpmem.

github-actions · 2024-05-10T05:43:51Z

Hello! The Git Commit Checker CI bot found a few problems with this PR:

2f7c5e2: Merge latest of local ompiv5

check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

amd-nithyavs · 2024-05-10T06:02:09Z

@wenduwan We have rebased to the latest and squashed the commits.

acoll is a collective component optimized for AMD "Zen"-based processors. It supports Bcast, Allreduce, Reduce, Barrier, Gather and Allgather APIs. Signed-off-by: Nithya V S <Nithya.VS@amd.com>

github-actions bot added the Target: main label Apr 22, 2024

edgargabriel added the ⚠️ WIP-DNM! label Apr 22, 2024

juntangc reviewed Apr 26, 2024

View reviewed changes

edgargabriel removed the ⚠️ WIP-DNM! label May 6, 2024

wenduwan force-pushed the main branch from 17f8c74 to 550ac58 Compare May 8, 2024 13:38

amd-nithyavs closed this May 10, 2024

amd-nithyavs force-pushed the main branch from 2f7c5e2 to e44cd58 Compare May 10, 2024 05:56

amd-nithyavs reopened this May 10, 2024

Add acoll collective component

035788b

acoll is a collective component optimized for AMD "Zen"-based processors. It supports Bcast, Allreduce, Reduce, Barrier, Gather and Allgather APIs. Signed-off-by: Nithya V S <Nithya.VS@amd.com>

mshanthagit force-pushed the main branch from d44e532 to 035788b Compare May 11, 2024 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the acoll component #12484

Add the acoll component #12484

amd-nithyavs commented Apr 22, 2024

bosilca commented Apr 22, 2024

edgargabriel commented Apr 22, 2024 •

edited

juntangc Apr 26, 2024

amd-nithyavs Apr 26, 2024

juntangc Apr 26, 2024

amd-nithyavs Apr 30, 2024

juntangc Apr 26, 2024

amd-nithyavs Apr 30, 2024

juntangc Apr 26, 2024

amd-nithyavs Apr 30, 2024

juntangc Apr 26, 2024

amd-nithyavs Apr 30, 2024

juntangc Apr 26, 2024

mshanthagit Apr 29, 2024 •

edited

bosilca commented Apr 29, 2024

wenduwan commented Apr 30, 2024

hppritcha commented May 1, 2024

mshanthagit commented May 1, 2024

mshanthagit commented May 1, 2024

amd-nithyavs commented May 8, 2024

amd-nithyavs commented May 8, 2024

wenduwan commented May 8, 2024

wenduwan commented May 8, 2024

wenduwan commented May 8, 2024

github-actions bot commented May 10, 2024

amd-nithyavs commented May 10, 2024

		@@ -0,0 +1,15 @@
		Copyright (c) 2023-2024 Advanced Micro Devices, Inc. All rights

Add the acoll component #12484

Are you sure you want to change the base?

Add the acoll component #12484

Conversation

amd-nithyavs commented Apr 22, 2024

bosilca commented Apr 22, 2024

edgargabriel commented Apr 22, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mshanthagit Apr 29, 2024 • edited

Choose a reason for hiding this comment

bosilca commented Apr 29, 2024

wenduwan commented Apr 30, 2024

hppritcha commented May 1, 2024

mshanthagit commented May 1, 2024

mshanthagit commented May 1, 2024

amd-nithyavs commented May 8, 2024

amd-nithyavs commented May 8, 2024

wenduwan commented May 8, 2024

wenduwan commented May 8, 2024

wenduwan commented May 8, 2024

github-actions bot commented May 10, 2024

amd-nithyavs commented May 10, 2024

edgargabriel commented Apr 22, 2024 •

edited

mshanthagit Apr 29, 2024 •

edited