Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platforms for identical devices do not compare equal #13721

Open
fknorr opened this issue May 9, 2024 · 1 comment
Open

Platforms for identical devices do not compare equal #13721

fknorr opened this issue May 9, 2024 · 1 comment
Labels
bug Something isn't working cuda CUDA back-end

Comments

@fknorr
Copy link

fknorr commented May 9, 2024

Describe the bug

To split work in a multi-GPU setting, we need to find sets of equal / compatible GPUs on a system.

On a system with 4x Nvidia RTX 3090, sycl::platform::get_platforms() returns four distinct platforms that stringify to sycl::platform(vendor="NVIDIA Corporation", name="NVIDIA CUDA BACKEND"), but compare unequal with operator== and do not produce the same hash. I would expect all devices to share a platform in this case.

Comparing the backend for finding a set of equal GPUs is not enough either, since DPC++ produces the same backend enumerator at least for the Intel(R) OpenCL and Intel(R) FPGA Emulation Platform for OpenCL(TM) platforms which clearly do not originate from a multi-GPU situation.

To reproduce

int main() {
	std::unordered_map<std::string, std::vector<sycl::platform>> platforms_by_string;
	for(const auto& pf : sycl::platform::get_platforms()) {
		const auto string = "vendor=" + pf.get_info<sycl::info::platform::vendor>()
                + " name=" + pf.get_info<sycl::info::platform::name>()
                + " version=" + pf.get_info<sycl::info::platform::version>();
		platforms_by_string[string].push_back(pf);
	}
	for(const auto& [string, pfs] : platforms_by_string) {
        printf("%zu platforms matching %s\n", pfs.size(), string.c_str());
		for(size_t i = 0; i < pfs.size(); ++i) {
			for(size_t j = 0; j < pfs.size(); ++j) {
				assert(std::hash<sycl::platform>{}(pfs[i]) == std::hash<sycl::platform>{}(pfs[j]));
				assert(pfs[i] == pfs[j]);
			}
		}
	}
}

Environment

  • OS: Ubuntu 22.04
  • DPC++ e330855 (May 7, 2024)
  • CUDA 12.3

Additional context

No response

@fknorr fknorr added the bug Something isn't working label May 9, 2024
fknorr added a commit to fknorr/celerity-runtime that referenced this issue May 9, 2024
DPC++ creates multiple distinct platforms in a multi-GPU setting, see intel/llvm#13721 .

This reverts commit 36a4857.
@JackAKirk
Copy link
Contributor

This will be fixed when oneapi-src/unified-runtime#1565 is merged. We are working on this as a priority.

@AlexeySachkov AlexeySachkov added the cuda CUDA back-end label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

3 participants