You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently evaluating various frameworks for GPU acceleration for a project of mine and found that Taichi is slower than expected. Due to foreign function call overhead, Taichi is expected to be a little slower than native CUDA, but it should not be three times slower than CuPy with custom kernels.
To Reproduce
Here is a Taichi implementation of matrix-vector multiplication ($A x = b$). Am I missing something?
Describe the bug
I am currently evaluating various frameworks for GPU acceleration for a project of mine and found that Taichi is slower than expected. Due to foreign function call overhead, Taichi is expected to be a little slower than native CUDA, but it should not be three times slower than CuPy with custom kernels.
To Reproduce
Here is a Taichi implementation of matrix-vector multiplication ($A x = b$ ). Am I missing something?
I've also got
matvec
implementations for CUDA, OpenCL, CuPy, CuBLAS, Numba and Taichi with other backends here for comparison.Log/Screenshots
Additional comments
I have tried this with other Taichi versions, CUDA drivers and GPUs. The results were similar.
System Info
The text was updated successfully, but these errors were encountered: