[Feature]: Support equivalent of cublasCherkEx() #1320

torrance · 2023-05-30T08:34:03Z

Is your feature request related to a problem? Please describe.

It's common to have large, low-precision input matrices that you'd like to multiply at full internal precision using rocblas<t>gemm(), possibly (but not necessarily) with output at full precision.

Describe the solution you'd like

Support the equivalent of cublas<t>gemmEx() as described here: https://docs.nvidia.com/cuda/cublas/#cublas-gemmEx

Describe alternatives you've considered

An alternative is to copy the input matrices to double precision first. If the output is not required at full precision, a further copy must be made and the precision truncated. This alternative doubles memory pressure on the GPU and causes extra copying of memory.

The text was updated successfully, but these errors were encountered:

TorreZuk · 2023-05-30T14:46:48Z

Thanks for your report @torrance. rocBLAS supports the equivalent of cublasgemmEx with the function rocblas_gemm_ex described here: https://rocm.docs.amd.com/projects/rocBLAS/en/latest/API_Reference_Guide.html#rocblas-gemm-ex-batched-strided-batched It implements numerous mixed precision and high precision accumulations (HPA) so please review it. If it is missing one you require please provide a list of specific missing data types for inputs, output and compute, in the order of your interest (describing your use case is also helpful). Based on your feedback we can consider adding additional ones but the most common forms should already be implemented.

torrance · 2023-05-31T02:41:29Z

@TorreZuk Thank you! HIPIFY complained there was no suitable equivalent and I clearly didn't spend long enough verifying that.

If I can hijack my own issue (!), what about a hipblas/rocblas equivalent to cublasCherkEx()? My searching of the documentation (as well as HIPIFY) seem to suggest not, and it's a bit of a stickler to the conversion of this codebase.

TorreZuk · 2023-05-31T16:53:04Z

Sure we can recycle this for request of an equivalent to cublasCherkEx() which is a new feature request. Can ask if @emankov has any insights into cublasgemmEx() hipify mapping to rocblas_gemm_ex but for all the argument datatype enums maybe those have to be manually chosen?

amcamd · 2023-06-05T18:56:42Z

Hello @torrance,
cublasCherkEx() supports CUDA_C_8I datatype for matrix A. This is a complex number with two 8 bit signed integers. I have some questions about this datatype:

Do you require support for CUDA_C_8I in cublasCgemmEx() as well as in cublasCherkEx(). We have a rocblas_gemm_ex function, it supports real 8 bit integers but not complex 8 bit integers.
Can you say what application is using this CUDA_C_8I datatype? Real 8 bit integers are used in machine learning, what is the use case for complex 8 bit integers?

Thanks Andrew

torrance · 2023-06-09T09:11:34Z

Hi @amcamd

Can you say what application is using this CUDA_C_8I datatype? Real 8 bit integers are used in machine learning, what is the use case for complex 8 bit integers?

Yes, they are needed. Lots of radio astronomy correlators record observations of the sky as simple 8 bit complex integers, which can later be normalised as part of calibration. The 8 bits integer representation has the advantage of having constant deltas between values, as opposed to floating representation. At the high end, we let the integer representation 'saturate' and later flag these values. They are also necessarily complex, since radio astronomy works in the Fourier domain.

We want to avoid converting these to higher precision values because these values make up the raw data of our observations and are absolutely massive in size.

Hope this helps give some context.

amcamd · 2023-06-09T14:07:09Z

Hi @torrance ,
Thank you for the context and the use case. I was guessing this is related to radio astronomy and the installations you have in Western Australia.

TorreZuk self-assigned this May 30, 2023

TorreZuk changed the title ~~[Feature]: Support equivalent of cublas<t>gemmEx()~~ [Feature]: Support equivalent of cublasCherkEx() May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support equivalent of cublasCherkEx() #1320

[Feature]: Support equivalent of cublasCherkEx() #1320

torrance commented May 30, 2023 •

edited

TorreZuk commented May 30, 2023

torrance commented May 31, 2023 •

edited

TorreZuk commented May 31, 2023

amcamd commented Jun 5, 2023

torrance commented Jun 9, 2023 •

edited

amcamd commented Jun 9, 2023 •

edited

[Feature]: Support equivalent of cublasCherkEx() #1320

[Feature]: Support equivalent of cublasCherkEx() #1320

Comments

torrance commented May 30, 2023 • edited

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

TorreZuk commented May 30, 2023

torrance commented May 31, 2023 • edited

TorreZuk commented May 31, 2023

amcamd commented Jun 5, 2023

torrance commented Jun 9, 2023 • edited

amcamd commented Jun 9, 2023 • edited

torrance commented May 30, 2023 •

edited

torrance commented May 31, 2023 •

edited

torrance commented Jun 9, 2023 •

edited

amcamd commented Jun 9, 2023 •

edited