You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe there are some missing gemm_batch implementations, looking at the oneMKL docs it seems this should support. A gemm_batch with, two half matrices as input, a float matrix out, and float scaling. My reference: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-0/gemm-batch.html
I run into issues of this overload not being found. Is my documentation correct, or have I misunderstood something?
oneMKL works with multiple HW and backend libraries and also depends on the
compiler and build environment. Include
the following information to help reproduce the issue:
HW: A100 GPU
Backend: cuBlas
OS: Ubuntu 20.04
Compiler version: DPC++ 2024.0.2
Steps to reproduce
Compile with for NVidia GPUs: icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda reproducer_onemkl_batch.cpp -lonemkl
or for Intel GPUs: icpx -fsycl reproducer_onemkl_batch.cpp -lonemkl
reproducer_onemkl_batch.cpp:60:5: error: no matching function for call to 'gemm_batch'
60 | oneapi::mkl::blas::column_major::gemm_batch(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
reproducer_onemkl_batch.cpp:75:5: note: in instantiation of function template specialization 'run_gemm<sycl::detail::half_impl::half, sycl::detail::half_impl::half, float, float>' requested here
75 | run_gemm<sycl::half, sycl::half, float, float>(q);
Given the documentation I linked to above, I would expect this to compile. As the docs express that this combination of data types are supported.
The text was updated successfully, but these errors were encountered:
@AidanBeltonS Thanks for reporting this. At this point, this gap is known and expected. The documentation you linked points to oneMKL Product implementation (not oneMKL open source interfaces). Typically, new APIs/features are implemented in oneMKL Product first and then they are ported to oneMKL open source interfaces.
If this use case is critical for your application, please let us know. We also encourage everyone to contribute :)
Thanks for the response @mmeterel thank you for clarifying the documentation. Yes this would be something that is critical for our application.
It relates to the SYCLomatic translation of llama.cpp and using gemm_batch. I would be happy to help get this working, especially for the CUDA and AMD implementations.
Summary
I believe there are some missing gemm_batch implementations, looking at the oneMKL docs it seems this should support. A
gemm_batch
with, two half matrices as input, a float matrix out, and float scaling. My reference: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-0/gemm-batch.htmlI run into issues of this overload not being found. Is my documentation correct, or have I misunderstood something?
Version
oneMKL hash: 7d2044e
Environment
oneMKL works with multiple HW and backend libraries and also depends on the
compiler and build environment. Include
the following information to help reproduce the issue:
Steps to reproduce
Compile with for NVidia GPUs:
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda reproducer_onemkl_batch.cpp -lonemkl
or for Intel GPUs:
icpx -fsycl reproducer_onemkl_batch.cpp -lonemkl
Error:
Given the documentation I linked to above, I would expect this to compile. As the docs express that this combination of data types are supported.
The text was updated successfully, but these errors were encountered: