Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemm throws exception on PVC #308

Open
BenBrock opened this issue Apr 26, 2023 · 16 comments
Open

gemm throws exception on PVC #308

BenBrock opened this issue Apr 26, 2023 · 16 comments

Comments

@BenBrock
Copy link

BenBrock commented Apr 26, 2023

Summary

I'm trying to use gemm on PVC, but it keeps throwing an exception. Please let me know where I'm going wrong.

I am attempting to use gemm and execute on a 4oam PVC system on ORTCE. I am getting an exception thrown with both production icpx and with the most recent version of intel/llvm, both compiled with production oneMKL.

A minimal reproducer is attached below.

  sycl::queue q(sycl::default_selector_v);

  T* a_d = sycl::malloc_device<T>(m * k, q);
  T* b_d = sycl::malloc_device<T>(k * n, q);
  T* c_d = sycl::malloc_device<T>(m * n, q);
  
  std::vector<T> a_l(m*k);
  std::vector<T> b_l(k*n);
  std::vector<T> c_l(m*n, 0);

  for (std::size_t i = 0; i < m*k; i++) {
    a_l[i] = drand48();
  }

  for (std::size_t i = 0; i < k*n; i++) {
    b_l[i] = drand48();
  }

  q.memcpy(a_d, a_l.data(), m*k*sizeof(T)).wait();
  q.memcpy(b_d, b_l.data(), k*n*sizeof(T)).wait();
  q.memcpy(c_d, c_l.data(), m*n*sizeof(T)).wait();

  std::cout << "Running MKL gemm..." << std::endl;

  auto event = oneapi::mkl::blas::row_major::gemm(q,
    oneapi::mkl::transpose::nontrans,
    oneapi::mkl::transpose::nontrans,
    m, n, k,
    T(1),
    a_d, k,
    b_d, n,
    T(1),
    c_d, n);
  event.wait();

This throws the following exception:

(base) bbrock@sdp4452:~/src/issues/oneMKL_gemm$ ./gemm
Running MKL gemm...
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Level-Zero error:700000041879048196
On device: 'Intel(R) Graphics [0x0bd5]'
in kernel: oneapi::mkl::blas::sgemm_itcopy
Aborted (core dumped)

As far as I can tell, I am allocating enough memory, and all of the pointers I'm passing in are USM device pointers, which should be accessible on the device associated with the queue passed to oneMKL.

Version

I am using production oneMKL 2023.1.0.

Environment

I am running this on a machine with four PVC GPUs.

(base) bbrock@sdp125071:~/src/distributed-ranges/examples/shp$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.15.3.0.20_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel (R) Xeon (R) CPU Max 9480 OpenCL 3.0 (Build 0) [2023.15.3.0.20_160000]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24595]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24595]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24595]
[ext_oneapi_level_zero:gpu:3] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24595]

I am using production oneMKL 2023.1.0.

I am getting this error with both the most recent commit of intel/llvm and with production icpx.

(base) bbrock@sdp125071:~/src/distributed-ranges/examples/shp$ icpx --version
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm
Configuration file: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm/../bin/icpx.cfg

Steps to reproduce

(base) bbrock@sdp125071:~/src/issues/oneMKL_gemm$ ./gemm
MESA: warning: Driver does not support the 0xbd5 PCI ID.
MESA: warning: Driver does not support the 0xbd5 PCI ID.
MESA: warning: Driver does not support the 0xbd5 PCI ID.
MESA: warning: Driver does not support the 0xbd5 PCI ID.
Running MKL gemm...
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Level-Zero error:700000041879048196
On device: 'Intel(R) Graphics [0x0bd5]'
in kernel: oneapi::mkl::blas::sgemm_itcopy
Aborted (core dumped)

Observed behavior

Throws an exception as above.

Expected behavior

I expect the kernel to execute successfully.

oneMKL_gemm.tar.gz

@BenBrock
Copy link
Author

The gemm_usm example included with production oneMKL also throws the same error on PVC.

(base) bbrock@sdp125071:~/src/issues/oneMKL_gemm$ ./gemm_usm

########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example: 
# 
# C = alpha * A * B + beta * C
# 
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
# 
# Using apis:
#   gemm
# 
# Supported floating point type precisions:
#   float
#   double
# 
########################################################################

Running tests on GPU.
	Running with single precision real data type:
		Caught synchronous SYCL exception during GEMM:
Level-Zero error:700000041879048196
On device: 'Intel(R) Graphics [0x0bd5]'
in kernel: oneapi::mkl::blas::sgemm_incopy
OpenCL status: 1

		GEMM parameters:
			transA = trans, transB = nontrans
			m = 45, n = 98, k = 67
			lda = 103, ldB = 105, ldC = 106
			alpha = 2, beta = 3

		Outputting 2x2 block of A,B,C matrices:

			A = [ 0.340188, 0.260249, ...
			    [ -0.105617, 0.0125354, ...
			    [ ...


			B = [ -0.326421, -0.192968, ...
			    [ 0.363891, 0.251295, ...
			    [ ...


			C = [ 0.400017, 0.310497, ...
			    [ 0.00257462, -0.0560381, ...
			    [ ...

# Identical errors are thrown for double and complex as well

I've added the example to my minimal reproducer tarball here: oneMKL_gemm_example.tar.gz

@BenBrock
Copy link
Author

This is actually running fine on Borealis, so I think this might be a configuration issue with ORTCE. I will get in touch with the people who run the cluster.

@mmeterel
Copy link
Contributor

mmeterel commented May 4, 2023

@BenBrock Thanks for the logs and update on Borealis. The error you see typically occurs when oneMKL can not detect the GPU architecture (PVC) and uses an alternative code path - which is not functional on PVC. So, that explains why you see the issue on specific machine. As you mentioned, this is probably a configuration issue on ORTCE. Please let us know what you find.

@maleadt
Copy link

maleadt commented Mar 29, 2024

@mmeterel Could you elaborate on what kind of misconfiguration causes this? I'm working on making oneAPI.jl support PVC hardware, however we're seeing a similar issue:

terminate called after throwing an instance of 'sycl::_V1::exception'
what():  Level-Zero error:700000041879048196
On device: 'Intel(R) Data Center GPU Max 1550'
in kernel: oneapi::mkl::blas::sgemm_itcopy
      From worker 16:
[85716] signal (6.-6): Aborted
in expression starting at none:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
__verbose_terminate_handler at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
__terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
__cxa_throw at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++/eh_throw.cc:98
_ZN6oneapi3mkl3gpu13build_programEPiPN4sycl3_V15queueEPvS7_iPKcS9_mcS9_Pb at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpuL22mkl_gpu_get_kernel_extEPiPN4sycl3_V15queueEiPKcS8_mcS8_S8_S8_mPKvmbb at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu24mkl_gpu_get_spirv_kernelEPiPN4sycl3_V15queueEiPK22mkl_gpu_spirv_kernel_tPKcSB_ at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu40mkl_blas_gpu_sgemm_copybased_driver_syclEPiPN4sycl3_V15queueEPNS1_14blas_arg_usm_tEP20mkl_gpu_event_list_t at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu30mkl_blas_gpu_sgemm_driver_syclEPiPN4sycl3_V15queueEPNS1_14blas_arg_usm_tEP20mkl_gpu_event_list_t at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu19sgemm_sycl_internalEPN4sycl3_V15queueE10MKL_LAYOUT13MKL_TRANSPOSES7_lllNS0_16value_or_pointerIfEEPKflSB_lS9_PflNS0_4blas12compute_modeERKSt6vectorINS3_5eventESaISG_EElll at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu10sgemm_syclEPN4sycl3_V15queueE10MKL_LAYOUT13MKL_TRANSPOSES7_lllNS0_16value_or_pointerIfEEPKflSB_lS9_PflNS0_4blas12compute_modeERKSt6vectorINS3_5eventESaISG_EElll at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl4blas5sgemmERN4sycl3_V15queueE10MKL_LAYOUTNS0_9transposeES7_lllNS0_16value_or_pointerIfEEPKflSB_lS9_PflNS1_12compute_modeERKSt6vectorINS3_5eventESaISF_EE at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl4blas12column_major4gemmERN4sycl3_V15queueENS0_9transposeES7_lllNS0_16value_or_pointerIfEEPKflSB_lS9_PflNS1_12compute_modeERKSt6vectorINS4_5eventESaISF_EE at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libmkl_sycl_blas.so.4 (unknown line)
onemklSgemm at /home/sdp/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/deps/lib/liboneapi_support.so (unknown line)

As you can see, this code is being called from a oneMKL wrapper library (liboneapi_support) to work around the lack of C API in oneMKL. We build and distribute this library ourselves, along with the required MKL and SYCL libraries, downloaded from Conda:

We're probably doing something wrong here, because the MWE provided above works fine when using the system MKL (from oneAPI 2024.0, same as what we use for building liboneapi_support). I'm doing this on IDC, using a Max 1550.

@mmeterel
Copy link
Contributor

mmeterel commented Apr 1, 2024

@maleadt It is hard to tell what is going wrong from the logs you sent. Can you please clarify your last paragraph? In your working configuration, are you using DPCPP compiler and oneMKL bits from the same 2024.0 base tool kit release? If yes, what is different in your non-working version? (Compiler? oneMKL?)

Also, what is the driver version you are using? (You can share the results of sycl-ls)

@sknepper
Copy link
Contributor

sknepper commented Apr 1, 2024

Hi @maleadt - thanks for your work on oneAPI.jl! Intel oneMKL product currently requires the OpenCL GPU runtime even when the Level-Zero backend is used. Could you please install it and see if that resolves the issue?

@maleadt
Copy link

maleadt commented Apr 3, 2024

In your working configuration, are you using DPCPP compiler and oneMKL bits from the same 2024.0 base tool kit release?

I'm using the tools and libraries that are provisioned by the image on IDC, which according to the website seems to be: Ubuntu 22.04 LTS (Jammy Jellyfish) v20240129, oneAPI base kit 2024.0.1, oneAPI HPC kit 2024.0.1 and oneAPI render kit 2024.0.0

If yes, what is different in your non-working version? (Compiler? oneMKL?)

I'm using 2024.0.0 from Conda for my wrapper library. That library however isn't built on-device, it's built on a buildbot, and redistributed together with the necessary MKL/SYCL/OpenCL dependencies.

Intel oneMKL product currently requires the OpenCL GPU runtime even when the Level-Zero backend is used.

We already redistribute the things that our MKL wrapper library depends on, including libopencl, see https://github.com/JuliaPackaging/Yggdrasil/blob/77c11e9e797db54e68a8cfd83eb9b0d38830e80f/O/oneAPI_Support/build_tarballs.jl#L116-L119. This has been working perfectly on other architectures, except PVC. We aim for the redistributable wrapper library to be fully stand-alone, so that users don't have to install anything to get oneAPI.jl to work.

@mmeterel
Copy link
Contributor

mmeterel commented Apr 3, 2024

Adding @mkrainiuk to this discussion as she is more familiar with the distribution of oneMKL (interfaces)

@pengtu
Copy link

pengtu commented Apr 10, 2024

@maleadt : Can you run the program with LD_DEBUG=libs with both the failing and working versions, as the problem is likely due to different OpenCL library is invoked at runtime?

Also add @kballeda to the thread.

@maleadt
Copy link

maleadt commented Apr 10, 2024

Here you are: https://gist.github.com/maleadt/55d9069b5c63e381858dbe64d9f690d3. At first sight, everything looks OK there, and all oneMKL-related resources are loaded from the artifacts directory (i.e. there's no pollution by system libraries).

@pengtu
Copy link

pengtu commented Apr 10, 2024

There is 'calling init' on the C++ side of the following library that doesn't exist on the Julia side: 128030: calling init: /lib/x86_64-linux-gnu/libze_intel_gpu.so.1. Could it be the problem?

@maleadt
Copy link

maleadt commented Apr 11, 2024

libze_intel_gpu is there on the Julia side too, but it's loaded earlier (when oneAPI.jl loads):

calling init: /home/sdp/.julia/artifacts/2a52b1197a324e3df923175b2035c42899f069f2/lib/libze_intel_gpu.so.1

@maleadt
Copy link

maleadt commented Apr 15, 2024

Turns out the issue was with my libOpenCL.so, which I took from intel-opencl-rt on Conda, somehow did not support or detect my PVC hardware, even resulting in clinfo returning in 0 platforms. After switching to Khronos' ICD loader, MKL works fine.

That said, this error as reported before is inscrutable and should be improved to something actionable.

@mmeterel
Copy link
Contributor

@maleadt Thanks for the update and glad to see you found the problem. I should have thought about suggesting clinfo check! Sorry about that.

IMHO, when the right openCL library is not used from user side, oneMKL-GEMM could still give correct functionality but issue a warning about low performance. Does it sound reasonable?

@maleadt
Copy link

maleadt commented Apr 15, 2024

oneMKL-GEMM could still give correct functionality but issue a warning about low performance. Does it sound reasonable?

Yes, that sounds great. Even a fatal error would be a good option, as long as it comes with an error message that would help diagnose the issue (No OpenCL device detected, or whatever).

@mmeterel
Copy link
Contributor

I would vote for correct functionality + warning. :)
Is it ok to close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants