You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Using the upstreamed CXI provider (as of commit fc869ae main branch) yields reduced throughput in fi_pingpong (14GB/s for ofiwg/libfabric compared to 20GB/s for HPE-internal libfabric).
To Reproduce
Steps to reproduce the behavior:
Launch fi_pingpong -p cxi -e rdm on two Slingshot-connected nodes.
Observe performance deviation between ofiwg/libfabric and HPE-internal libfabric
Expected behavior
Equivalent performance between both libfabric-variants (~20GB/s).
Output
Deviating performance:
~14GB/s for ofiwg/libfabric
~20GB/s for hpe/libfabric
It is worth noting that the observed throughput of ofiwg/libfabric can be increased by setting the number of iterations from the default 10 to 100 via -I 100.
Additionally, using osu_bw and osu_latency from the OSU Microbenchmark Suite, no performance differences are observed between the two libfabric variants.
Additional context
Due to a currently unresolved issue with the local Slingshot deployment on the used ARM platform, it is required to set FI_CXI_LLRING_MODE=never for both fi_pingpong and osu_bw.
The text was updated successfully, but these errors were encountered:
I'm using the latest internal sources, I don't know the version number to be honest. I configure cxi, cassini headers and UAPI headers to directly point to the sources. Please tell me if you have any command to check a version that would be interesting for you, but note that my installation is not standard compared to official SlingShot packages, I'm working in parallel on a packages-based installation but it's on x86_64 so this will not be helpful in this case I guess (I'll try anyway).
Describe the bug
Using the upstreamed CXI provider (as of commit fc869ae main branch) yields reduced throughput in
fi_pingpong
(14GB/s for ofiwg/libfabric compared to 20GB/s for HPE-internal libfabric).To Reproduce
Steps to reproduce the behavior:
fi_pingpong -p cxi -e rdm
on two Slingshot-connected nodes.Expected behavior
Equivalent performance between both libfabric-variants (~20GB/s).
Output
Deviating performance:
It is worth noting that the observed throughput of ofiwg/libfabric can be increased by setting the number of iterations from the default 10 to 100 via
-I 100
.Additionally, using
osu_bw
andosu_latency
from the OSU Microbenchmark Suite, no performance differences are observed between the two libfabric variants.I've attached raw output of the
fi_pingpong
runs andosu_bw
/osu_latency
runs.Environment:
./configure LDFLAGS=-Wl,--build-id --enable-cxi=yes --enable-only --enable-restricted-dl --enable-tcp --enable-udp --enable-rxm --enable-rxd --enable-hook_debug --enable-hook_hmem --enable-dmabuf_peer_mem --enable-verbs --enable-gdrcopy-dlopen --enable-profile=dl
--with-ofi=yes
Additional context
Due to a currently unresolved issue with the local Slingshot deployment on the used ARM platform, it is required to set
FI_CXI_LLRING_MODE=never
for bothfi_pingpong
andosu_bw
.The text was updated successfully, but these errors were encountered: