Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taichi slower than CUDA/OpenCL #8526

Open
99991 opened this issue May 9, 2024 · 0 comments
Open

Taichi slower than CUDA/OpenCL #8526

99991 opened this issue May 9, 2024 · 0 comments

Comments

@99991
Copy link

99991 commented May 9, 2024

Describe the bug

I am currently evaluating various frameworks for GPU acceleration for a project of mine and found that Taichi is slower than expected. Due to foreign function call overhead, Taichi is expected to be a little slower than native CUDA, but it should not be three times slower than CuPy with custom kernels.

To Reproduce

Here is a Taichi implementation of matrix-vector multiplication ($A x = b$). Am I missing something?

import taichi as ti
import numpy as np
import time

@ti.kernel
def matvec(A: ti.template(), x: ti.template(), b: ti.template()):
    m, n = A.shape
    ti.loop_config(block_dim=8)
    for i in range(m):
        s = 0.0
        for j in range(n):
            s += A[i, j] * x[j]
        b[i] = s

@ti.kernel
def init(x: ti.template()):
    for i in ti.grouped(x):
        x[i] = ti.random(ti.float32)

def main():
    m = 512
    n = 1024

    A = ti.field(shape=(m, n), dtype=ti.float32)
    x = ti.field(shape=n, dtype=ti.float32)
    b = ti.field(shape=m, dtype=ti.float32)

    init(A)
    init(x)
    init(b)

    b_expected_np = A.to_numpy() @ x.to_numpy()

    for _ in range(100):
        ti.sync()
        start_time = time.perf_counter()

        matvec(A, x, b)

        b_np = b.to_numpy()

        ti.sync()
        elapsed_time = time.perf_counter() - start_time

        print(f"{elapsed_time * 1e6:9.3f} µs")

        assert np.allclose(b_expected_np, b_np)

if __name__ == "__main__":
    ti.init(arch=ti.cuda)
    main()

I've also got matvec implementations for CUDA, OpenCL, CuPy, CuBLAS, Numba and Taichi with other backends here for comparison.

Log/Screenshots

min_time

Additional comments

I have tried this with other Taichi versions, CUDA drivers and GPUs. The results were similar.

System Info
$ ti diagnose
[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]
system: linux
executable: /home/myusername/miniconda3/envs/myenv/bin/python
platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='f8pc', release='6.5.0-28-generic', version='#29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2', machine='x86_64')
/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages/taichi/tools/diagnose.py:20: DeprecationWarning: 'locale.getdefaultlocale' is deprecated and slated for removal in Python 3.15. Use setlocale(), getencoding() and getlocale() instead.
  print(f'locale: {".".join(locale.getdefaultlocale())}')
locale: en_US.UTF-8
PATH: /home/myusername/miniconda3/envs/myenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:~/executables
PYTHONPATH: ['/home/myusername/miniconda3/envs/myenv/bin', '/home/myusername/miniconda3/envs/myenv/lib/python311.zip', '/home/myusername/miniconda3/envs/myenv/lib/python3.11', '/home/myusername/miniconda3/envs/myenv/lib/python3.11/lib-dynload', '/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages', '/media/myusername/samsung870qvo4tb/data/stable-diffusion/OneTrainer/src/diffusers/src', '/media/myusername/samsung870qvo4tb/data/stable-diffusion/OneTrainer/src/mgds/src']

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy


                                                                                              
import: <module 'taichi' from '/home/myusername/miniconda3/envs/myenv/lib/python3.11/site-packages/taichi/__init__.py'>                                                                             
                                                                                              
cpu: True                                                                                     
metal: False                                                                                  
opengl: True                                                                                  
cuda: True                                                                                    
vulkan: True                                                                                  
                                                                                              
`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'                     
                                                                                              
Thu May  9 18:46:25 2024                                                                      
+-----------------------------------------------------------------------------------------+   
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |   
|-----------------------------------------+------------------------+----------------------+   
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |   
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |   
|                                         |                        |               MIG M. |   
|=========================================+========================+======================|   
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0  On |                  N/A |   
| N/A   44C    P8             11W /   80W |     153MiB /   6144MiB |     17%      Default |   
|                                         |                        |                  N/A |   
+-----------------------------------------+------------------------+----------------------+   
                                                                                              
+-----------------------------------------------------------------------------------------+   
| Processes:                                                                              |   
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |   
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       985      G   /usr/lib/xorg/Xorg                            149MiB |
+-----------------------------------------------------------------------------------------+

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=x64

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=opengl

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7
[Taichi] Starting on arch=cuda

[Taichi] version 1.7.0, llvm 15.0.4, commit 2fd24490, linux, python 3.11.7

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                   TAICHI EXAMPLES                                    
 ──────────────────────────────────────────────────────────────────────────────────── 
  0: ad_gravity               25: karman_vortex_street    50: patterns                
  1: circle_packing_image     26: keyboard                51: pbf2d                   
  2: comet                    27: laplace                 52: physarum                
  3: cornell_box              28: laplace_equation        53: poisson_disk_sampling   
  4: diff_sph                 29: mandelbrot_zoom         54: print_offset            
  5: differential_evolution   30: marching_squares        55: rasterizer              
  6: euler                    31: mass_spring_3d_ggui     56: regression              
  7: eulerfluid2d             32: mass_spring_game        57: sdf_renderer            
  8: explicit_activation      33: mass_spring_game_ggui   58: simple_derivative       
  9: export_mesh              34: mciso_advanced          59: simple_texture          
  10: export_ply              35: mgpcg                   60: simple_uv               
  11: export_videos           36: mgpcg_advanced          61: snow_phaseField         
  12: fem128                  37: minimal                 62: stable_fluid            
  13: fem128_ggui             38: minimization            63: stable_fluid_ggui       
  14: fem99                   39: mpm128                  64: stable_fluid_graph      
  15: fractal                 40: mpm128_ggui             65: taichi_bitmasked        
  16: fractal3d_ggui          41: mpm3d                   66: taichi_dynamic          
  17: fullscreen              42: mpm3d_ggui              67: taichi_logo             
  18: game_of_life            43: mpm88                   68: taichi_ngp              
  19: gui_image_io            44: mpm88_graph             69: taichi_sparse           
  20: gui_widgets             45: mpm99                   70: texture_graph           
  21: implicit_fem            46: mpm_lagrangian_forces   71: tutorial                
  22: implicit_mass_spring    47: nbody                   72: two_stream_instability  
  23: initial_value_problem   48: odop_solar              73: vortex_rings            
  24: jacobian                49: oit_renderer            74: waterwave               
 ──────────────────────────────────────────────────────────────────────────────────── 
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.19s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 4.34s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Untriaged
Development

No branches or pull requests

1 participant