Taichi slower than CUDA/OpenCL #8526

99991 · 2024-05-09T16:54:15Z

Describe the bug

I am currently evaluating various frameworks for GPU acceleration for a project of mine and found that Taichi is slower than expected. Due to foreign function call overhead, Taichi is expected to be a little slower than native CUDA, but it should not be three times slower than CuPy with custom kernels.

To Reproduce

Here is a Taichi implementation of matrix-vector multiplication ($A x = b$). Am I missing something?

import taichi as ti
import numpy as np
import time

@ti.kernel
def matvec(A: ti.template(), x: ti.template(), b: ti.template()):
    m, n = A.shape
    ti.loop_config(block_dim=8)
    for i in range(m):
        s = 0.0
        for j in range(n):
            s += A[i, j] * x[j]
        b[i] = s

@ti.kernel
def init(x: ti.template()):
    for i in ti.grouped(x):
        x[i] = ti.random(ti.float32)

def main():
    m = 512
    n = 1024

    A = ti.field(shape=(m, n), dtype=ti.float32)
    x = ti.field(shape=n, dtype=ti.float32)
    b = ti.field(shape=m, dtype=ti.float32)

    init(A)
    init(x)
    init(b)

    b_expected_np = A.to_numpy() @ x.to_numpy()

    for _ in range(100):
        ti.sync()
        start_time = time.perf_counter()

        matvec(A, x, b)

        b_np = b.to_numpy()

        ti.sync()
        elapsed_time = time.perf_counter() - start_time

        print(f"{elapsed_time * 1e6:9.3f} µs")

        assert np.allclose(b_expected_np, b_np)

if __name__ == "__main__":
    ti.init(arch=ti.cuda)
    main()

I've also got matvec implementations for CUDA, OpenCL, CuPy, CuBLAS, Numba and Taichi with other backends here for comparison.

Log/Screenshots

Additional comments

I have tried this with other Taichi versions, CUDA drivers and GPUs. The results were similar.

System Info

$ ti diagnose
[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]
system: linux
executable: /home/myusername/miniconda3/envs/myenv/bin/python
platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='f8pc', release='6.5.0-28-generic', version='#29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2', machine='x86_64')
/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages/taichi/tools/diagnose.py:20: DeprecationWarning: 'locale.getdefaultlocale' is deprecated and slated for removal in Python 3.15. Use setlocale(), getencoding() and getlocale() instead.
  print(f'locale: {".".join(locale.getdefaultlocale())}')
locale: en_US.UTF-8
PATH: /home/myusername/miniconda3/envs/myenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:~/executables
PYTHONPATH: ['/home/myusername/miniconda3/envs/myenv/bin', '/home/myusername/miniconda3/envs/myenv/lib/python311.zip', '/home/myusername/miniconda3/envs/myenv/lib/python3.11', '/home/myusername/miniconda3/envs/myenv/lib/python3.11/lib-dynload', '/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages', '/media/myusername/samsung870qvo4tb/data/stable-diffusion/OneTrainer/src/diffusers/src', '/media/myusername/samsung870qvo4tb/data/stable-diffusion/OneTrainer/src/mgds/src']

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy


                                                                                              
import: <module 'taichi' from '/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages/taichi/__init__.py'>                                                                             
                                                                                              
cpu: True                                                                                     
metal: False                                                                                  
opengl: True                                                                                  
cuda: True                                                                                    
vulkan: True                                                                                  
                                                                                              
`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'                     
                                                                                              
Thu May  9 18:46:25 2024                                                                      
+-----------------------------------------------------------------------------------------+   
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |   
|-----------------------------------------+------------------------+----------------------+   
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |   
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |   
|                                         |                        |               MIG M. |   
|=========================================+========================+======================|   
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0  On |                  N/A |   
| N/A   44C    P8             11W /   80W |     153MiB /   6144MiB |     17%      Default |   
|                                         |                        |                  N/A |   
+-----------------------------------------+------------------------+----------------------+   
                                                                                              
+-----------------------------------------------------------------------------------------+   
| Processes:                                                                              |   
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |   
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       985      G   /usr/lib/xorg/Xorg                            149MiB |
+-----------------------------------------------------------------------------------------+

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=x64

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=opengl

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=cuda

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                   TAICHI EXAMPLES                                    
 ──────────────────────────────────────────────────────────────────────────────────── 
  0: ad_gravity               25: karman_vortex_street    50: patterns                
  1: circle_packing_image     26: keyboard                51: pbf2d                   
  2: comet                    27: laplace                 52: physarum                
  3: cornell_box              28: laplace_equation        53: poisson_disk_sampling   
  4: diff_sph                 29: mandelbrot_zoom         54: print_offset            
  5: differential_evolution   30: marching_squares        55: rasterizer              
  6: euler                    31: mass_spring_3d_ggui     56: regression              
  7: eulerfluid2d             32: mass_spring_game        57: sdf_renderer            
  8: explicit_activation      33: mass_spring_game_ggui   58: simple_derivative       
  9: export_mesh              34: mciso_advanced          59: simple_texture          
  10: export_ply              35: mgpcg                   60: simple_uv               
  11: export_videos           36: mgpcg_advanced          61: snow_phaseField         
  12: fem128                  37: minimal                 62: stable_fluid            
  13: fem128_ggui             38: minimization            63: stable_fluid_ggui       
  14: fem99                   39: mpm128                  64: stable_fluid_graph      
  15: fractal                 40: mpm128_ggui             65: taichi_bitmasked        
  16: fractal3d_ggui          41: mpm3d                   66: taichi_dynamic          
  17: fullscreen              42: mpm3d_ggui              67: taichi_logo             
  18: game_of_life            43: mpm88                   68: taichi_ngp              
  19: gui_image_io            44: mpm88_graph             69: taichi_sparse           
  20: gui_widgets             45: mpm99                   70: texture_graph           
  21: implicit_fem            46: mpm_lagrangian_forces   71: tutorial                
  22: implicit_mass_spring    47: nbody                   72: two_stream_instability  
  23: initial_value_problem   48: odop_solar              73: vortex_rings            
  24: jacobian                49: oit_renderer            74: waterwave               
 ──────────────────────────────────────────────────────────────────────────────────── 
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.19s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 4.34s

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taichi slower than CUDA/OpenCL #8526

Taichi slower than CUDA/OpenCL #8526

99991 commented May 9, 2024

Taichi slower than CUDA/OpenCL #8526

Taichi slower than CUDA/OpenCL #8526

Comments

99991 commented May 9, 2024