"addmm_cuda" not implemented for 'Long' #1009

meanderingstream · 2022-12-12T03:22:35Z

I have the Axon VAE notebook, fashionmnist_vae.livemd, running under Torchx CPU. I can regularly get the notebook to fail when executing the Enum.at line in the following:
{input_batch, target_batch} = Enum.at(train_data, 0)

It also fails with Enum.take(train_data,0)

The same notebook using XLA functions just fine.

Mix.install(
[
# {:exla, "> 0.4.0"},
# {:exla, "> 0.4.1"},
{:torchx, "> 0.4.1"},
# {:nx, "> 0.4.0", override: true},
{:nx, "> 0.4.1"},
{:axon, "> 0.3.0"},
{:req, "> 0.3.1"},
{:kino, "> 0.7.0"},
{:scidata, "> 0.1.9"},
{:stb_image, "> 0.5.2"},
{:kino_vega_lite, "> 0.1.6"},
{:vega_lite, "> 0.1.6"},
{:table_rex, "~> 3.1.1"}
],

system_env: %{"XLA_TARGET" => "cuda111"}

system_env: %{"LIBTORCH_TARGET" => "cu116"}
)

alias VegaLite, as: Vl

This speeds up all our `Nx` operations without having to use `defn`

Nx.global_default_backend(EXLA.Backend)

Nx.global_default_backend(Torchx.Backend)

I have Cuda Toolkit 11.8 and CudaDNN installed.

terminate called after throwing an instance of 'c10::Error'
what(): "addmm_cuda" not implemented for 'Long'
Exception raised from operator() at ../aten/src/ATen/native/cuda/Blas.cpp:311 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7fac60c452eb in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xce (0x7fac60c40cbe in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libc10.so)
frame #2: + 0x2e7ba41 (0x7fac0f27ba41 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cuda_cu.so)
frame #3: at::native::structured_mm_out_cuda::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) + 0x53 (0x7fac0f27bcc3 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cuda_cu.so)
frame #4: + 0x2bc09ac (0x7fac0efc09ac in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cuda_cu.so)
frame #5: + 0x2bc0a63 (0x7fac0efc0a63 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cuda_cu.so)
frame #6: + 0x1e5be32 (0x7fac3685be32 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #7: at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) + 0x76 (0x7fac3685c3b6 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #8: + 0x3297ebf (0x7fac37c97ebf in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #9: + 0x3298d46 (0x7fac37c98d46 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #10: at::_ops::mm::call(at::Tensor const&, at::Tensor const&) + 0xdf (0x7fac368a621f in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #11: at::native::tensordot(at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef) + 0xaff (0x7fac35f6a4df in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #12: + 0x229402b (0x7fac36c9402b in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #13: at::_ops::tensordot::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef) + 0x1a8 (0x7fac36692888 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/libtorch/libtorch_cpu.so)
frame #14: tensordot(enif_environment_t*, int, unsigned long const*) + 0x9f9 (0x7fad00871789 in /home/ml3/.cache/mix/installs/elixir-1.14.2-erts-13.1.2/75737351c79772535db35cdfe6072671/_build/dev/lib/torchx/priv/torchx.so)
frame #15: erts_call_dirty_nif + 0x1ec (0x560f5f300b0c in /home/ml3/.asdf/installs/elixir/1.14.2-otp-25/.mix/escripts/livebook)
frame #16: erts_dirty_process_main + 0x20b (0x560f5f17770b in /home/ml3/.asdf/installs/elixir/1.14.2-otp-25/.mix/escripts/livebook)
frame #17: + 0x6a815 (0x560f5f0cc815 in /home/ml3/.asdf/installs/elixir/1.14.2-otp-25/.mix/escripts/livebook)
frame #18: + 0x360520 (0x560f5f3c2520 in /home/ml3/.asdf/installs/elixir/1.14.2-otp-25/.mix/escripts/livebook)
frame #19: + 0x94b43 (0x7fad4623cb43 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: + 0x126a00 (0x7fad462cea00 in /lib/x86_64-linux-gnu/libc.so.6)

The text was updated successfully, but these errors were encountered:

josevalim · 2022-12-12T10:22:26Z

I wonder how Python handles such cases. Do they check for the device and use separate operations?

meanderingstream · 2022-12-12T11:42:28Z

Since I can regularly repeat this at the same section of the notebook, I don't think it is really about the device per se. This is the first point in the notebook where data is retrieve from a stream.

meanderingstream · 2022-12-12T11:43:46Z

Since XLA executes the notebook just fine, it is something specific to Torchx.

meanderingstream · 2022-12-12T11:48:38Z

My MatMul using Torchx 0.4.1 and cu116 works just fine. https://github.com/meanderingstream/dl_foundations_in_elixir/blob/main/01h_matmul_Torchx_gpu.livemd. That notebook doesn't use streams.

josevalim · 2022-12-13T08:22:28Z

Sorry, in this case I meant :gpu/:cuda as device. The operation is not implemented at the low-level for CUDA, so they have to handle it elsewhere. Usually by downcasting/upcasting before/after performing it.

josevalim · 2024-05-12T16:19:14Z

I will go ahead and close this as a LibTorch bug. You either need to use a policy to downcast to a lower precision or LibTorch has to implement the relevant operation on Cuda.

josevalim added the area:torchx Applies to Torchx label Dec 13, 2022

josevalim changed the title ~~Runtime node terminated unexpectedly - no connection~~ "addmm_cuda" not implemented for 'Long' Jan 26, 2023

josevalim closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"addmm_cuda" not implemented for 'Long' #1009

"addmm_cuda" not implemented for 'Long' #1009

meanderingstream commented Dec 12, 2022

josevalim commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

josevalim commented Dec 13, 2022

josevalim commented May 12, 2024

"addmm_cuda" not implemented for 'Long' #1009

"addmm_cuda" not implemented for 'Long' #1009

Comments

meanderingstream commented Dec 12, 2022

system_env: %{"XLA_TARGET" => "cuda111"}

This speeds up all our Nx operations without having to use defn

Nx.global_default_backend(EXLA.Backend)

josevalim commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

meanderingstream commented Dec 12, 2022

josevalim commented Dec 13, 2022

josevalim commented May 12, 2024

This speeds up all our `Nx` operations without having to use `defn`