Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half #41069

sbonner0 · 2020-07-07T11:32:12Z

Hi,

I am trying to perform sparse and dense matrix multiplication using half precision tensors in pytorch.

The following code:

import torch
a = torch.randn(3,2).half().cuda()
i = torch.LongTensor([[0, 1, 1],  [2, 0, 2]]) 
v = torch.FloatTensor([3, 4, 5]) 
b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).half().cuda()
c = torch.spmm(b, a)

will produce this error: RuntimeError: "addmm_sparse_cuda" not implemented for 'Half'

Is there anyway to solve this?

Environment

-PyTorch version: 1.5.0
-Is debug build: No
-CUDA used to build PyTorch: 10.2

-OS: Arch Linux
-GCC version: (GCC) 10.1.0
-CMake version: version 3.17.3

-Python version: 3.8
-Is CUDA available: Yes
-CUDA runtime version: 10.2.89
-GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
-Nvidia driver version: 440.100
-cuDNN version: /usr/lib/libcudnn.so.7.6.5

cc @vincentqb @aocsa

The text was updated successfully, but these errors were encountered:

mruberry · 2020-07-08T00:00:54Z

Thanks for filing this issue, @sbonner0. You can perform the operation with a float32 tensor, of course, but short of that I think you'd have to write your own kernel or get one added to PyTorch, unfortunately.

sbonner0 · 2020-07-08T15:46:38Z

Hi @mruberry thanks so much for your reply! Do you think it would be challenging to code a kernel to do this and submit it to pytorch? I would be very happy to give it a go.

mruberry · 2020-07-10T03:42:59Z

Hi @mruberry thanks so much for your reply! Do you think it would be challenging to code a kernel to do this and submit it to pytorch? I would be very happy to give it a go.

Excellent question! I actually looked at this a bit and cuSPARSE does support this operation in half (https://docs.nvidia.com/cuda/cusparse/index.html). So you'd need to edit the dispatch here:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

Line 154 in df252c0

AT_DISPATCH_FLOATING_TYPES(

to include half types, then get a system using the newer cuSPARSE:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu

Line 84 in df252c0

    
           #if !defined(_MSC_VER) && defined(__CUDACC__) && CUSPARSE_VERSION >= 10301 // CUDA release >= 10.2 and not windows

update the checks:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu

Line 93 in df252c0

    
           static_assert(std::is_same<float, T>::value || std::is_same<double, T>::value, "csrmm2 only supports float and double value types");

and instantiate the appropriate c10:Half template:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu

Line 181 in df252c0

template void csrmm2<float>(

Then make sure cuSPARSE is being called properly and write a test in Python verifying the change works.

If you have a machine with CUDA 10.2+ and are familiar with building projects like PyTorch and C++ it's definitely doable.

sbonner0 · 2020-07-10T11:43:17Z

Hi @mruberry thank you so much for this very detailed reply! Although it seems that you have largely done all the work - I can set aside some time at the end of next week to try and give this ago.
Seeing that this seems to require CUDA 10.2, if I was able to get this to work, should it be something that I should consider as a pull request?

mruberry · 2020-07-10T11:44:49Z

Yep, it'd be great to get a PR implementing this!

sorenrasmussenai · 2021-05-28T12:07:55Z

A note for anyone working on this (or future self):

I have been fiddling a bit with this issue, and I have run into some really weird behaviour when using modding the existing code to support float16.
I have not had the time to isolate the problem, but I believe it is due to a bug in CuSparse, specifically the algorithm CUSPARSE_SPMM_CSR_ALG1 with CUDA_R_16F. Changing the algorithm to CUSPARSE_SPMM_CSR_ALG2 makes the problem go away. Note that CUSPARSE_SPMM_CSR_ALG2 is non-deterministic, which may be a deal-breaker..

I have seen the bug manifest itself in the form of noise in a single column (column 1 in my case) in the result matrix, while the remaining columns were correct. The noise would change, depending on the results in (at least some of) the remaining columns. Unfortunately, I am not free to share the code.

IvanYashchuk · 2022-01-06T13:49:46Z

CSR matrix - dense matrix multiplication is now supported for float16:

In [1]: import torch

In [2]: a = torch.randn(3,2).half().cuda()
   ...: i = torch.LongTensor([[0, 1, 1],  [2, 0, 2]])
   ...: v = torch.FloatTensor([3, 4, 5])
   ...: b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).half().cuda()

In [3]: b = b.to_sparse_csr()

In [4]: b @ a
Out[4]:
tensor([[-0.6729, -1.0430],
        [-1.8916,  0.8125]], device='cuda:0', dtype=torch.float16)

cddavis93 · 2022-04-15T22:37:53Z

I used the concept listed above but it yielded the same error, as the original post.
I attempted 'to_sparse_csr()' as well as 'to_sparse'
My pytorch version is 1.11.0

Is there any further documentation that describes half precision sparse MM?

ducdauge · 2022-05-28T20:09:47Z

@cddavis93 I replicated @IvanYashchuk's result with torch-1.11.0+cu113

puddingfjz · 2024-04-13T10:17:53Z

@IvanYashchuk, is cusparse used for the ``b @ a'' in

CSR matrix - dense matrix multiplication is now supported for float16:

In [1]: import torch

In [2]: a = torch.randn(3,2).half().cuda()
   ...: i = torch.LongTensor([[0, 1, 1],  [2, 0, 2]])
   ...: v = torch.FloatTensor([3, 4, 5])
   ...: b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).half().cuda()

In [3]: b = b.to_sparse_csr()

In [4]: b @ a
Out[4]:
tensor([[-0.6729, -1.0430],
        [-1.8916,  0.8125]], device='cuda:0', dtype=torch.float16)

Where can I find the code for this part?

mruberry added module: half Related to float16 half-precision floats module: sparse Related to torch.sparse feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 7, 2020

sbonner0 mentioned this issue Jul 10, 2020

Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half NVIDIA/apex#907

Closed

fuy34 mentioned this issue Dec 27, 2020

How to use sparse.mm in float16 training pipeline Lightning-AI/pytorch-lightning#5282

Closed

pearu added this to To do in Sparse tensors Aug 10, 2021

IvanYashchuk closed this as completed Jan 6, 2022

Sparse tensors automation moved this from To do to Done Jan 6, 2022

tvercaut mentioned this issue Mar 7, 2024

RuntimeError: "addmm_sparse_cuda" not implemented for 'Half' cai4cai/torchsparsegradutils#53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half #41069

Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half #41069

sbonner0 commented Jul 7, 2020 •

edited by pytorch-probot bot

mruberry commented Jul 8, 2020

sbonner0 commented Jul 8, 2020

mruberry commented Jul 10, 2020

sbonner0 commented Jul 10, 2020

mruberry commented Jul 10, 2020

sorenrasmussenai commented May 28, 2021 •

edited

IvanYashchuk commented Jan 6, 2022

cddavis93 commented Apr 15, 2022

ducdauge commented May 28, 2022

puddingfjz commented Apr 13, 2024

Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half #41069

Support for torch.mm with Sparse Half Tensors? "addmm_sparse_cuda" not implemented for Half #41069

Comments

sbonner0 commented Jul 7, 2020 • edited by pytorch-probot bot

Environment

mruberry commented Jul 8, 2020

sbonner0 commented Jul 8, 2020

mruberry commented Jul 10, 2020

sbonner0 commented Jul 10, 2020

mruberry commented Jul 10, 2020

sorenrasmussenai commented May 28, 2021 • edited

IvanYashchuk commented Jan 6, 2022

cddavis93 commented Apr 15, 2022

ducdauge commented May 28, 2022

puddingfjz commented Apr 13, 2024

sbonner0 commented Jul 7, 2020 •

edited by pytorch-probot bot

sorenrasmussenai commented May 28, 2021 •

edited