Platforms for identical devices do not compare equal #13721

fknorr · 2024-05-09T07:44:05Z

Describe the bug

To split work in a multi-GPU setting, we need to find sets of equal / compatible GPUs on a system.

On a system with 4x Nvidia RTX 3090, sycl::platform::get_platforms() returns four distinct platforms that stringify to sycl::platform(vendor="NVIDIA Corporation", name="NVIDIA CUDA BACKEND"), but compare unequal with operator== and do not produce the same hash. I would expect all devices to share a platform in this case.

Comparing the backend for finding a set of equal GPUs is not enough either, since DPC++ produces the same backend enumerator at least for the Intel(R) OpenCL and Intel(R) FPGA Emulation Platform for OpenCL(TM) platforms which clearly do not originate from a multi-GPU situation.

To reproduce

int main() {
	std::unordered_map<std::string, std::vector<sycl::platform>> platforms_by_string;
	for(const auto& pf : sycl::platform::get_platforms()) {
		const auto string = "vendor=" + pf.get_info<sycl::info::platform::vendor>()
                + " name=" + pf.get_info<sycl::info::platform::name>()
                + " version=" + pf.get_info<sycl::info::platform::version>();
		platforms_by_string[string].push_back(pf);
	}
	for(const auto& [string, pfs] : platforms_by_string) {
        printf("%zu platforms matching %s\n", pfs.size(), string.c_str());
		for(size_t i = 0; i < pfs.size(); ++i) {
			for(size_t j = 0; j < pfs.size(); ++j) {
				assert(std::hash<sycl::platform>{}(pfs[i]) == std::hash<sycl::platform>{}(pfs[j]));
				assert(pfs[i] == pfs[j]);
			}
		}
	}
}

Environment

OS: Ubuntu 22.04
DPC++ e330855 (May 7, 2024)
CUDA 12.3

Additional context

No response

The text was updated successfully, but these errors were encountered:

DPC++ creates multiple distinct platforms in a multi-GPU setting, see intel/llvm#13721 . This reverts commit 36a4857.

JackAKirk · 2024-05-09T13:53:54Z

This will be fixed when oneapi-src/unified-runtime#1565 is merged. We are working on this as a priority.

fknorr added the bug Something isn't working label May 9, 2024

fknorr added a commit to fknorr/celerity-runtime that referenced this issue May 9, 2024

Revert "[IDAG] do not select devices from multiple platforms"

3293bf1

DPC++ creates multiple distinct platforms in a multi-GPU setting, see intel/llvm#13721 . This reverts commit 36a4857.

AlexeySachkov added the cuda CUDA back-end label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Platforms for identical devices do not compare equal #13721

Platforms for identical devices do not compare equal #13721

fknorr commented May 9, 2024

JackAKirk commented May 9, 2024

Platforms for identical devices do not compare equal #13721

Platforms for identical devices do not compare equal #13721

Comments

fknorr commented May 9, 2024

Describe the bug

To reproduce

Environment

Additional context

JackAKirk commented May 9, 2024