Skip to content

Releases: taichi-dev/taichi

v1.7.1

17 Apr 03:15
0f143b2
Compare
Choose a tag to compare

Highlights:

  • Bug fixes
    • Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
  • Documentation
    • Update offset.md (#8470) (by Kenshi Takayama)
    • Update math_module.md (#8471) (by Kenshi Takayama)
    • Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
  • Miscellaneous
    • Bump version to 1.7.1 (by Haidong Lan)
    • Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)

Full changelog:

  • [Misc] Bump version to 1.7.1 (by Haidong Lan)
  • [bug] Fix abs on unsigned types (#8476) (by Lin Jiang)
  • [Doc] Update offset.md (#8470) (by Kenshi Takayama)
  • [Doc] Update math_module.md (#8471) (by Kenshi Takayama)
  • [Doc] Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
  • [Misc] Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)
  • [lang] Warn about non-contiguous gradient tensors (#8450) (by Bob Cao)
  • [autodiff] Fix the type of cmp statements in autodiff (#8452) (by Lin Jiang)
  • [Bug] Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
  • [misc] Add flag to disable taichi header print (#8413) (by Chaoming Wang)

v1.7.0

27 Nov 06:46
2fd2449
Compare
Choose a tag to compare

1. New features

1.1 Real Function

We are excited to announce the stabilization of the Real Function feature in Taichi Lang v1.7.0. Initially introduced as an experimental feature in v1.0.0, it has now matured with enhanced capabilities and usability.

Key Updates

  • Decorator Change: The Real Function now uses @ti.real_func. The previous decorator, @ti.experimental.real_func, is deprecated.
  • Performance Improvements: Real Functions, unlike Taichi inline functions (@ti.func), are compiled as separate entities, akin to CUDA's device functions. This separation allows for recursive runtime calls and significantly faster compilation. For instance, the Cornell box example's compilation time is reduced from 2.34s to 1.01s on an i9-11900K when switching from inline to real functions.
  • Enhanced Functionality: Real Functions support multiple return statements, offering greater flexibility in coding.

Limitations

  • Backend Support: Real Functions are currently only compatible with LLVM-based backends, including CPU and CUDA.
  • Parallel Loops: Writing parallel loops within Real Functions is not supported. However, if called within a parallel loop in a kernel, the Real Function will be parallelized accordingly.

Important Note on Usage: Ensure all arguments and return values in Real Functions are explicitly type-hinted.

Usage Example

The following example demonstrates the recursive capability of Real Functions. The sum_func Real Function is used to calculate the sum of numbers from 1 to n, showcasing its ability to handle multiple return statements and variable recursion depths.

@ti.real_func
def sum_func(n: ti.i32) -> ti.i32:
    if n == 0:
        return 0
    return sum_func(n - 1) + n

@ti.kernel
def sum(n: ti.i32) -> ti.i32:
    return sum_func(n)

print(sum(100))  # 5050

You can find more examples of the real function in the repository.

1.2 Enhancements in Kernel Arguments and Return Values

Support for Multiple Return Values in Taichi Kernel:

In this update, we've introduced the capability to return multiple values from a Taichi kernel. This can be achieved by specifying a tuple as the return type. You can directly use (ti.f32, s0) as the type hint or write the type hint in Python manner like typing.Tuple[ti.f32, s0] or for Python 3.9 and above, tuple[ti.f32, s0] . The following example illustrates this new feature:

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)

@ti.real_func
def foo() -> (ti.f32, s0):
    return 1, s0(a=ti.math.vec3([100, 0.5, 3]), b=1)

@ti.kernel
def bar() -> (ti.f32, s0):
    return foo()
    
ret1, ret2 = bar()
print(ret1)  # 1.0
print(ret2)  # {'a': [100.0, 0.5, 3.0], 'b': 1}

Removal of Size Limit on Kernel Arguments and Return Values:

We have eliminated the size restrictions on kernel arguments and return values. However, it's crucial to remember that keeping these small is advisable. Large argument or return value sizes can lead to substantially longer compile times. While we support larger sizes, we haven't thoroughly tested arguments and return values exceeding 4KB and cannot guarantee their flawless functionality.

1.3 Argument Pack

Taichi now introduces a powerful feature for developers - Argument Packs. This new functionality enables efficient caching of unchanged parameters between multiple kernel calls, which not only provides convenience when launching a kernel, but also boosts the performance.

Key Advantages

  • Argument Pack: User-defined data types that encapsulate multiple parameters into a single, manageable unit.
  • Buffering Capability: Store and reuse parameters that remain constant across kernel calls, reducing the overhead of repeated parameter passing.
  • Device-level Caching: Taichi optimizes performance by caching argpacks directly on the device.

Usage Example

import taichi as ti
ti.init()

# Defining a custom argument type using "ti.types.argpack"
view_params_tmpl = ti.types.argpack(view_mtx=ti.math.mat4, proj_mtx=ti.math.mat4, far=ti.f32)

# Declaration of a Taichi kernel leveraging Argument Packs
@ti.kernel
def p(view_params: view_params_tmpl) -> ti.f32:
    return view_params.far

# Instantiation of the argument pack
view_params = view_params_tmpl(
    view_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    proj_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    far=1)

# Executing the kernel with the Argument Pack
print(p(view_params))  # Outputs: 1.0

Supported Data Types

Argument Packs are currently compatible with a variety of data types, including scalar, matrix, vector, Ndarray, and Struct.

Limitations

Please note that Argument Packs currently do not support the following features and data types:

  • Ahead-of-Time (AOT) Compilation and Compute Graph
  • ti.template
  • ti.data_oriented

2. Improvements

2.1 CUDA Memory Allocation Improvements

Dynamic VRAM Allocation:

  • In our latest update, the CUDA backend has been optimized to dynamically allocate Video RAM (VRAM), significantly reducing the initial preallocation requirement. Now, less than 50MB is preallocated upon ti.init.

Changes in device_memory_GB and device_memory_fraction Usage:

  • These settings are now specifically tailored for preallocating memory for SPARSE data structures, such as ti.pointer. This preallocation occurs only once a Sparse data structure is detected in your code.

Impact on VRAM Consumption:

  • Users can expect a noticeable decrease in VRAM usage with these enhancements. For instance:
    diffmpm3d:  3866MB --> 3190 MB
    nerf_train_deploy: 5618MB --> 4664 MB

2.2 CUDA SIMT APIs

Added the following ti.simt.block APIs:

  • ti.simt.block.sync_any_nonzero
  • ti.simt.block.sync_all_nonzero
  • ti.simt.block.sync_count_nonzero

2.3 Sparse grid APIs

Added helper function to create a 2D/3D sparse grid, for example:

    # create a 2D sparse grid
    grid = ti.sparse.grid(
        {
            "pos": ti.math.vec2,
            "mass": ti.f32,
            "grid2particles": ti.types.vector(20, ti.i32),
        },
        shape=(10, 10),
    )

    # access
    grid[0, 0].pos = ti.math.vec2(1, 2)
    grid[0, 0].mass = 1.0
    grid[0, 0].grid2particles[2] = 123

2.4 GGUI

  • Added Metal backend support for GGUI

2.5 AOT

  • Added C-APIs of ti_import_cpu_memory() and ti_import_cuda_memory()
  • Added support for multiple AOT runtime devices
  • Added support for matrix/vector in compute graph in C-API
  • Added support for matrix/vector in compute graph in Python

2.6 Error reporting

  • Improved the quality and coverage of error messages

2.7 Autodiff

  • supports passing vector/matrix arguments in autodiff kernel
  • supports autodiff for torch Tensor and taichi ndarray on CPU and CUDA
  • supports passing grad tensor to primal kernel

3. Bug Fixes

3.1 Autodiff Bugfixes

  • Fixed a few bugs with use of ti.ad.Tape
  • Fixed a bug with random seed for loss

3.2 AOT Bugfixes

  • Fixed a few bugs with compute graph
  • Fixed a few bugs with C-API

3.3 API Bugfixes

  • Fixed a bunch of bugs related to Matrix/Vector
  • Fixed an error with Ndarray type check
  • Fixed a few error with taichi.math APIs
  • Fixed an error with SNode destruction
  • Fixed an error with dataclass support for struct with matrix
  • Fixed an error with ti.func
  • Fixed a few errors with ti.struct and struct field
  • Fixed a few errors with Sparse Matrix

3.4 Build & Environment Bugfixes

  • Fixed a few compilation issues on Windows platform
  • Fixed an issue with cusolver dependency

3.5 GGUI Bugfixes

  • Fix vec_to_euler that breaks GGUI cameras & handle camera logic better
  • Fix for ImGui widget size on HiDPI

4. Deprecation Notice

  • We have removed the CC backend because it is rarely used, and it lacks maintenance.
  • We are deprecating ti.experimental.real_func because it is no longer experimental. Please use ti.real_func instead.

5. Full changelog

Highlights:
   - **Bug fixes**
      - Fix macro error with ti_import_cpu_memory (#8401) (by **Zhanlue Yang**)
      - Fix argpack nesting issues (by **listerily**)
      - Convert matrices to structs in argpack type members, Fixing layout error (by **listerily**)
      - Fix error when returning a struct field member when the return … (#8271) (by **秋云未云**)
      - Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
      - Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
      - Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
      - Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
      - Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
      - Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
      - Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
   - **CUDA backend**
      - Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
   - **Documentation**
      - Fixing typo in impl.py on ti.grouped function documentation (#8407) (by **Quentin Warnant**)
      - Update doc about kernels and functions (#8400) (by **Lin Jiang**)
      - Update documentation (#8089) (by **Zhao Liang**)
      - Update docstring for inverse func (#8170) (by **Zhao Liang**)
      - Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
      - Fix a bug in faq.md (#7992) (by **Zhao Liang**)
      ...
Read more

v1.6.0

12 May 03:19
Compare
Choose a tag to compare

Deprecation Notice

  • We removed some APIs that were deprecated a long time ago. See the table below:
Removed API Replace with
Using atomic operations like a.atomic_add(b) ti.atomic_add(a, b) or a += b
Using is and is not inside Taichi kernel and Taichi function Not supported
Ndrange for loop with the number of the loop variables not equal to the dimension of the ndrange Not supported
ti.ui.make_camera() ti.ui.Camera()
ti.ui.Window.write_image() ti.ui.Window.save_image()
ti.SOA ti.Layout.SOA
ti.AOS ti.Layout.AOS
ti.print_profile_info ti.profiler.print_scoped_profiler_info
ti.clear_profile_info ti.profiler.clear_scoped_profiler_info
ti.print_memory_profile_info ti.profiler.print_memory_profiler_info
ti.CuptiMetric ti.profiler.CuptiMetric
ti.get_predefined_cupti_metrics ti.profiler.get_predefined_cupti_metrics
ti.print_kernel_profile_info ti.profiler.print_kernel_profiler_info
ti.query_kernel_profile_info ti.profiler.query_kernel_profiler_info
ti.clear_kernel_profile_info ti.profiler.clear_kernel_profiler_info
ti.kernel_profiler_total_time ti.profiler.get_kernel_profiler_total_time
ti.set_kernel_profiler_toolkit ti.profiler.set_kernel_profiler_toolkit
ti.set_kernel_profile_metrics ti.profiler.set_kernel_profiler_metrics
ti.collect_kernel_profile_metrics ti.profiler.collect_kernel_profiler_metrics
ti.VideoManager ti.tools.VideoManager
ti.PLYWriter ti.tools.PLYWriter
ti.imread ti.tools.imread
ti.imresize ti.tools.imresize
ti.imshow ti.tools.imshow
ti.imwrite ti.tools.imwrite
ti.ext_arr ti.types.ndarray
ti.any_arr ti.types.ndarray
ti.Tape ti.ad.Tape
ti.clear_all_gradients ti.ad.clear_all_gradients
ti.linalg.sparse_matrix_builder ti.types.sparse_matrix_builder
  • We no longer deprecate the builtin min/max function in the Taichi kernel anymore.
  • We deprecate some arguments in the declaration of the arguments of the compute graph, and they will be removed in v1.7.0. Including:
    • element_shape argument for scalar and ndarray
    • shape, channel_format and num_channels arguments for texture
  • cc backend will be removed at next release (v1.7.0)

New features

Struct arguments

You can now use struct arguments in all backends. The structs can be nested, and it can contain matrices and vectors. Here's an example:

transform_type = ti.types.struct(R=ti.math.mat3, T=ti.math.vec3)
pos_type = ti.types.struct(x=ti.math.vec3, trans=transform_type)
@ti.kernel
def kernel_with_nested_struct_arg(p: pos_type) -> ti.math.vec3:
    return p.trans.R @ p.x + p.trans.T
trans = transform_type(ti.math.mat3(1), [1, 1, 1])
p = pos_type(x=[1, 1, 1], trans=trans)
print(kernel_with_nested_struct_arg(p))  # [4., 4., 4.]

Ndarray

  • Support 0 dim ndarray read & write in python scope
  • Fixed a bug when writing into ndarray from Python scope

Improvements

  • Support rsqrt operator in autodiff
  • Added assembly printer for CPU backend Zhanlue Yang
  • Supporting CUDA shared array allocation over 48KiB

Performance

  • Improved vectorization support on CPU backend, with significant performance gains for specific applications

New Examples

  • 2D euler fluid simulation example by Lee-abcde

Misc

  • Python 3.11 support
  • ti.frexp is supported on CUDA, Vulkan, Metal, OpenGL backends.
  • ti.math.popcnt intrinsic by Garry Ling
  • Fixed a memory leak issue during SNodeTree destruction Zhanlue Yang
  • Added validation and improved error report for ti.Field finalization Zhanlue Yang
  • Fixed a memory leak issue with Cuda backend in C-API Zhanlue Yang
  • Added support for formatted printing with str.format() and f-strings Tianyi Liu
  • Changed Python code formatter from yapf to black

Developer Experience

  • build.py script for preparing build & testing environment

Full changelog

Highlights:

  • Bug fixes
    • Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
  • CUDA backend
    • Warn driver version if it doesn't support memory pool. (#7912) (by Haidong Lan)
    • Better handling shared array shape check (#7818) (by Haidong Lan)
    • Support large shared memory for CUDA backend (#7452) (by Haidong Lan)
  • Documentation
    • Add doc about struct arguments (#7959) (by Lin Jiang)
    • Fix docstring of mix function (#7922) (by Zhao Liang)
    • Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
    • Update doc for dynamic snode (#7804) (by Zhao Liang)
    • Update field.md (#7819) (by zhoooou)
    • Update readme (#7808) (by yanqingzhang)
    • Update write_test.md (#7745) (by Qian Bao)
    • Update performance.md (#7720) (by Zhao Liang)
    • Update readme (#7673) (by Zhao Liang)
    • Update tutorial.md (#7512) (by Chenzhan Shang)
    • Update gui_system.md (#7628) (by Qian Bao)
    • Remove deprecated api docstrings (#7596) (by pengyu)
    • Fix the cexp docstring (#7588) (by Zhao Liang)
    • Add doc about returning struct (#7556) (by Lin Jiang)
  • Error messages
    • Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
  • Language and syntax
    • Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
    • Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
    • Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
    • Remove the support for 'is' (#7930) (by Lin Jiang)
    • Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
    • Remove a.atomic(b) (#7925) (by Lin Jiang)
    • Cancel deprecating native min/max (#7928) (by Lin Jiang)
    • Let nested data classes have methods (#7909) (by Lin Jiang)
    • Let kernel argument support matrix nested in a struct (by lin-hitonami)
    • Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
    • Fix a bug on PosixPath (#7860) (by Zhao Liang)
    • Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
    • Fix pylance warning (#7805) (by Zhao Liang)
    • Support taking structs as kernel arguments (by lin-hitonami)
    • Fix math module circular import bugs (#7762) (by Zhao Liang)
    • Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
    • Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
    • Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
    • Support writing sparse matrix as matrix market file (#7529) (by pengyu)
      ...
Read more

v1.5.0

27 Mar 16:35
7b885c2
Compare
Choose a tag to compare

Deprecation Notice

  • ndarray no longer accepts field_dim, replaced by the ndim argument.
  • [RFC] Deprecate ti.cc backend in favor of TiRT and its C API, if you have any concerns please let us know at #7629

New features

AOT

  • Taichi Runtime (TiRT) now supports Apple's Metal API and OpenGL ES for compatibility on old mobile platforms. Now Taichi programs can be deployed to any mainstream consumer devices.
    NOTE Taichi program deployment on mobile platforms is experimental. Please contact us at contact@taichi.graphics for long-term services.
  • Taichi AOT now fully supports float16 dtype.

Ndarray

  • Out of bound check is now supported on ndarrays

Improvements

Python Frontend

We now support returning a struct on LLVM-based backends (CPU and CUDA backend). The struct can contain vectors and matrices, and it can also nest with other structs. Here's an example.

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
s1 = ti.types.struct(a=ti.f32, b=s0)

@ti.kernel
def foo() -> s1:
    return s1(a=1, b=s0(a=ti.math.vec3(100, 0.2, 3), b=1))

print(foo())  # {'a': 1.0, 'b': {'a': [100.0, 0.2, 3.0], 'b': 1}}

Performance

  • Support atomic operation on half2 for CUDA backend (with compute capability > 60). You can enable this with ti.init(half2_vectorization=True). This feature could effectively accelerate the Nerf training process, please refer to this repo for details.

GGUI

  • GGUI now has no computing backend restrictions! You can now use Metal, OpenGL, AMDGPU, or DirectX 11, in addition to CPU, CUDA, Vulklan that's previously suported by GGUI.
  • GGUI now has been validated on mesa's software rasterizer lavapipe, you can utilize this solution for headless server visualization, or on servers with no graphics capabilities (such as A100)
  • Add the fps_limit option which adjusts the maximal frame rate in GGUI.

Full changelog:

Highlights:
   - **AMDGPU backend**
      - Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
      - Add print kernel amdgcn (#7357) (by **Zeyu Li**)
      - Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
   - **Aot module**
      - Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
      - Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
   - **Bug fixes**
      - Fix copy_from() of StructField (#7294) (by **Yi Xu**)
      - Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
      - Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
      - Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
   - **Documentation**
      - Update GGUI docs with correct API (#7525) (by **pengyu**)
      - Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
      - Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
      - Fix typo in API doc (#7511) (by **pengyu**)
      - Update math_module (#7405) (by **Zhao Liang**)
      - Update hello_world.md (#7400) (by **Zhao Liang**)
      - Update debugging.md (#7401) (by **Zhao Liang**)
      - Update hello_world.md (#7380) (by **Zhao Liang**)
      - Update type.md (#7376) (by **Zhao Liang**)
      - Update kernel_function.md (#7375) (by **Zhao Liang**)
      - Update hello_world.md (#7369) (by **Zhao Liang**)
      - Update hello_world.md (#7368) (by **Zhao Liang**)
      - Update data_oriented_class.md (#6790) (by **Zhao Liang**)
      - Update hello_world.md (#7367) (by **Zhao Liang**)
      - Update kernel_function.md (#7364) (by **Zhao Liang**)
      - Update hello_world.md (#7354) (by **Zhao Liang**)
      - Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
      - Update profiler.md (#7358) (by **Zhao Liang**)
      - Update kernel_function.md (#7356) (by **Zhao Liang**)
      - Update tut.md (#7352) (by **Gabriel Vainer**)
      - Update type.md (#7350) (by **Zhao Liang**)
      - Update hello_world.md (#7337) (by **Zhao Liang**)
      - Update append docstring (#7265) (by **Zhao Liang**)
      - Update ndarray.md (#7236) (by **Gabriel Vainer**)
      - Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
      - Remove doc tutorial (#7198) (by **Olinaaaloompa**)
      - Rename tutorial doc (#7186) (by **Zhao Liang**)
      - Update tutorial.md (#7176) (by **Zhao Liang**)
      - Update math_module.md (#7175) (by **Zhao Liang**)
      - Update debugging.md (#7173) (by **Zhao Liang**)
      - Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
      - Update doc regarding dynamic index (#7148) (by **Yi Xu**)
      - Move glossary to top level (#7118) (by **Zhao Liang**)
      - Update type.md (#7038) (by **Zhao Liang**)
      - Fix docstring (#7065) (by **Zhao Liang**)
   - **Error messages**
      - Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
      - Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
      - Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
      - Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
      - Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
      - Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
      - Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
      - Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
      - Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
   - **GUI**
      - GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
   - **Intermediate representation**
      - Unified type system for internal operations (#6337) (by **daylily**)
   - **Language and syntax**
      - Keep ti.pyfunc (#7530) (by **Lin Jiang**)
      - Type check assignments between tensors (#7480) (by **Yi Xu**)
      - Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
      - Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
      - Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
      - Fix pylance types warning (#7417) (by **Zhao Liang**)
      - Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
      - Simplify the swizzle generator (#7216) (by **Zhao Liang**)
      - Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
      - Remove deprecated packed switch (#7104) (by **Yi Xu**)
      - Raise errors when using the packed switch (#7125) (by **Yi Xu**)
      - Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
      - Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
      - Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
      - Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
      - Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
   - **Miscellaneous**
      - Strictly check ndim with external array (#7126) (by **Haidong Lan**)

Full changelog:
   - [cc] Add deprecation notice for cc backend (#7651) (by **Ailing**)
   - [misc] Cherry pick struct return related commits (#7575) (by **Haidong Lan**)
   - [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
   - [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
   - [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
   - [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
   - [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
   - [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
   - [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
   - [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
   - [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
   - [build] Fix build failure without nvcc (#7521) (by **Ailing**)
   - [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
   - [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
   - [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
   - [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
   - [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
   - [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
   - [Doc] Fix typo in API doc (#7511) (by **pengyu**)
   - [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
   - [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
   - [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
   - [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
   - [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
   - [ci] Fix nightly version number issue (#7498) (by **Proton**)
   - [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
   - [type] Let Type * be serializable (by **lin-hitonami**)
   - [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
   - [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
   - [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
   - [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
   - [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
   - [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
   - [doc] Fix typo i...
Read more

v1.4.1

02 Feb 06:30
Compare
Choose a tag to compare

Highlights:

Full changelog:

  • [ci] Tolerate duplicates when registering version (#7281) (by Proton)
  • [misc] Fix manylinux2014 warning not printing (#7270) (by Proton)
  • [misc] Bump version to 1.4.1 (by Lin Jiang)
  • [misc] Update submodule taichi_assets (#7203) (by Lin Jiang)
  • [bug] Fix example circle-packing (#7194) (by Lin Jiang)

v1.4.0

16 Jan 09:26
Compare
Choose a tag to compare

Deprecation Notice

  • Support for sparse SNodes on the Metal backend has been removed.
  • ti.Matrix.rotation2d() has been removed.
  • The packed switch in ti.init() has been removed.
  • The dynamic_index switch in ti.init() is now deprecated and will be removed in v1.5.0. See the feature introduction below for details.
  • Slicing from a single row/column of a matrix (e.g.a[x, a:b]) now returns a vector instead of a matrix.

New features

AOT

Taichi AOT is officially available in Taichi v1.4.0, along with a native Taichi Runtime (TiRT) library taichi_c_api. Native applications can now load compiled AOT modules and launch Taichi kernels without a Python interpreter.

In this release, TiRT has stabilized the Vulkan backend on desktop platforms and Android. You can find prebuilt TiRT binaries on the release page. You can refer to a comprehensive tutorial on the doc site; the detailed TiRT C-API documentation is available at https://docs.taichi-lang.org/docs/taichi_core.

Ndarray

Taichi ndarray is now formally released in v1.4.0. The ndarray is an array object that holds contiguous multi-dimensional data to allow easy exchange with external libraries. See documentation for more details.

Dynamic index

Before v1.4.0, when you wanted to access a vector/matrix with a runtime variable instead of a compile-time constant, you had to set ti.init(dynamic_index=True). However, that option only works for LLVM-based backends (CPU & CUDA) and may slow down runtime performance because all matrices are affected. Starting from v1.4.0, that option is no longer needed. You can use variable indices whenever necessary on all backends without affecting the performance of those matrices with only constant indices.

Improvements

Performance

  • The compilation speed has been optimized by ~2x.

Example list & ti gallery

Since v1.0.0, we have been enriching our taichi example collection, bringing the number of demos in the gallery window from eight to twelve. Run ti gallery to check out some new demos!
image

Bug fixes

  • Incorrect behavior of struct fors on sparse SNodes in certain cases has been fixed. (#7121)
  • CUDA will no longer allocate extra device memory when performing to_numpy() and from_numpy(). (#7008)
  • StructType is now allowed as a type hint to ti.func. (#6964)
  • Incorrect recompilation caused by filling in a matrix field with the same matrix has been fixed. (#6951)
  • Matrix type inference has been fixed. (#6928)
  • Getting 64-bit data from ndarrays in the Python scope is now handled correctly. (#6836)
  • Name collision problem in ti.dataclass has been fixed. (#6737)

Highlights:

  • Aot module
    • Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
  • Bug fixes
    • Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
    • Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
    • Fix getting 64-bit data from ndarray in Python scope (#6836) (by Yi Xu)
    • Avoid overwriting global tmp with dynamic_index=True (#6820) (by Yi Xu)
  • Build system
    • Deprecate export_core (#7028) (by Zhanlue Yang)
  • Command line interface
    • Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
  • Documentation
    • Update tutorial.md (#7176) (by Zhao Liang)
    • Update math_module.md (#7175) (by Zhao Liang)
    • Update debugging.md (#7173) (by Zhao Liang)
    • Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
    • Update doc regarding dynamic index (#7148) (by Yi Xu)
    • Move glossary to top level (#7118) (by Zhao Liang)
    • Update type.md (#7038) (by Zhao Liang)
    • Fix docstring (#7065) (by Zhao Liang)
    • Remove packed mode in doc (#7030) (by Zhao Liang)
    • Minor doc update (#6952) (by Zhao Liang)
    • Glossary (#6101) (by Olinaaaloompa)
    • Update dac (#6875) (by Gabriel Vainer)
    • Update faq.md (#6921) (by Zhao Liang)
    • Update dataclass.md (#6876) (by Gabriel Vainer)
    • Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
    • Stop mentioning packed mode (#6755) (by Yi Xu)
  • Error messages
    • Raise errors when using metal sparse (#7113) (by Lin Jiang)
    • Do not show warning when the offline cache path does not exist (#7005) (by PGZXB)
  • GUI
    • Support colored texts (#7036) (by Dunfan Lu)
  • Intermediate representation
    • Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
  • Language and syntax
    • Raise errors when using the packed switch (#7125) (by Yi Xu)
    • Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
    • Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
    • Remove filename kwarg in aot Module save() (#7085) (by Ailing)
    • Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
    • Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
    • Deprecate the dynamic_index switch (#7071) (by Yi Xu)
    • Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
    • Fix gui docstring (#7003) (by Zhao Liang)
    • Support dynamic indexing in spirv (#6990) (by Yi Xu)
    • Support dynamic indexing in metal (#6985) (by Yi Xu)
    • Support LU sparse solver on CUDA backend (#6967) (by pengyu)
    • Fix struct type problem (#6949) (by Zhao Liang)
    • Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
    • Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
    • Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
    • Remove the real_matrix switch (#6885) (by Yi Xu)
    • Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
    • Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
    • Add bool type in python as an alias to i32 (#6742) (by daylily)
    • Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • Metal backend
    • Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
  • Miscellaneous
    • Strictly check ndim with external array (#7126) (by Haidong Lan)
    • Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by Zhanlue Yang)

Full changelog:

  • [Doc] Update tutorial.md (#7176) (by Zhao Liang)
  • [aot] (cherry-pick) Removed unused archs in C-API (#7167), FindTaichi CMake module to help outside project integration (#7168) (#7177) (by PENGUINLIONG)
  • [docs] Create windows_debug.md (#7164) (by Bob Cao)
  • [Doc] Update math_module.md (#7175) (by Zhao Liang)
  • [Doc] Update debugging.md (#7173) (by Zhao Liang)
  • [Doc] Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
  • [doc] Fix spelling of "paticle_field" (#7024) (by Xiang (Kevin) Li)
  • [doc] Update accelerate_python.md to use ti.max (#7161) (by Tao Jin)
  • [aot] Fixed ti_get_last_error signature (#7165) (by PENGUINLIONG)
  • [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by Zhao Liang)
  • [doc] Add doc ndarray (#7157) (by Olinaaaloompa)
  • [doc] Update field.md (Fields advanced) (#6867) (by Gabriel Vainer)
  • [ci] Use make_changelog.py to generate the full changelog (#7152) (by Lin Jiang)
  • [aot] Introduce new AOT deployment tutorial (#7144) (by PENGUINLIONG)
  • [Doc] Update doc regarding dynamic index (#7148) (by Yi Xu)
  • [Misc] Strictly check ndim with external array (#7126) (by Haidong Lan)
  • [ci] Run test when pushing to rc branches (#7146) (by Lin Jiang)
  • [ci] Disable backward_cpp on macOS (#7145) (by Proton)
  • [gui] Fix scene line renderable (#7131) (by Bob Cao)
  • [Lang] Raise errors when using the packed switch (#7125) (by Yi Xu)
  • [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by Ailing)
  • [ci] Temporarily disable ad_external_array on Metal (#7136) (by Bob Cao)
  • [Error] Raise errors when using metal sparse (#7113) (by Lin Jiang)
  • [misc] Cherry-pick #7072 into rc-v1.4.0 (#7135) (by Ailing)
  • [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by PENGUINLIONG)
  • [Lang] Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
  • [Bug] Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
  • [Doc] Move glossary to top level (#7118) (by Zhao Liang)
  • [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
  • [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
  • [doc] Modified some errors in the function examples (#7094) (by welann)
  • [ci] More Windows git hacks (#7102) (by Proton)
  • [Lang] Remove filename kwarg in aot Module save() (#7085) (by Ailing)
  • [Lang] Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
  • [example] Remove gui warning message (#7090) (by Zhao Liang)
  • [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
  • [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by Haidong Lan)
  • [example] Update gallery images (#7053) (by Zhao Liang)
  • [Doc] Update type.md (#7038) (by Zhao Liang)
  • [Doc] Fix docstring (#7065) (by Zhao Liang)
  • [Lang] Make sl...
Read more

v1.3.0

30 Nov 13:21
0f25b95
Compare
Choose a tag to compare

Deprecation Notice

  • Using sparse data structures on the Metal backend is now deprecated. The support for Dynamic SNode has been removed in v1.3.0, and the support for Pointer/Bitmasked SNode will be removed in v1.4.0.
  • The packed switch in ti.init() is now deprecated and will be removed in v1.4.0. See the feature introduction below for details.
  • ti.Matrix.rotation2d() is now deprecated and will be removed in v1.4.0. Use ti.math.rotation2d() instead.
  • To clearly distinguish vectors from matrices, transpose() on a vector is no longer allowed. If you want something like a @ b.transpose(), write a.outer_product(b) instead.
  • Ndarray: The arguments of ndarray type annotation element_dim, element_shape and field_dim will be deprecated in v1.4.0. The field_dim is renamed to ndim to make it more intuitive. element_dim and element_shape will be replaced by passing a matrix type into dtype argument. For example, the ti.types.ndarray(element_dim=2, element_shape=(3,3)) will be replaced by ti.types.ndarray(dtype=ti.matrix(3,3)).

New features

Dynamic SNode

To support variable-length fields, Taichi provides dynamic SNodes.
You can now use the dynamic SNode on fields of different data types, even struct fields and matrix fields.
You can use x[i].append(...) to append an element, use x[i].length() to get the length, and use x[i].deactivate() to clear the list as shown in the following code snippet.

pair = ti.types.struct(a=ti.i16, b=ti.i64)
pair_field = pair.field()

block = ti.root.dense(ti.i, 4)
pixel = block.dynamic(ti.j, 100, chunk_size=4)
pixel.place(pair_field)
l = ti.field(ti.i32)
ti.root.dense(ti.i, 5).place(l)

@ti.kernel
def dynamic_pair():
    for i in range(4):
        pair_field[i].deactivate()
        for j in range(i * i):
            pair_field[i].append(pair(i, j + 1))
        # pair_field = [[],
        #              [(1, 1)],
        #              [(2, 1), (2, 2), (2, 3), (2, 4)],
        #              [(3, 1), (3, 2), ... , (3, 8), (3, 9)]]
        l[i] = pair_field[i].length()  # l = [0, 1, 4, 9]

Packed Mode

Packed mode was introduced in v0.8.0 to allow users to trade runtime performance for memory usage. In v1.3.0, after the elimination of runtime overhead in common cases, packed mode has become the default mode. There's no longer any automatic padding behavior behind the scenes, so users can use fields and SNodes without surprise.

Sparse Matrix

We introduce the experimental sparse matrix and sparse solver on the CUDA backend. The API of using is the same as CPU backend. Currently, only the f32 data type and LLT linear solver are supported on CUDA. You can only use ti.ndarray to compute SpMV and linear solver operation. Float64 data type and other linear solvers are under implementation.

Improvements

Python Frontend

  • Matrix slicing now supports augmented assign (e.g. +=) besides assign.

Taichi Examples

  1. Our user https://github.com/Linyou contributed an excellent example on instant ngp renderer PR #6673. Run taichi_ngp to check it out!

[Developers only] LLVM15 upgrade

Starting from v1.3.0, Taichi has upgraded its LLVM dependency to version 15.0.0. If you're interested in contributing or simply building Taichi from source, please follow our installation doc for developers.
Note this change has no impact on Taichi users.

Highlights

  • Documentation
    • Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
    • Stop mentioning packed mode (#6755) (by Yi Xu)
  • Language and syntax
    • Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • Metal backend
    • Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)

Full changelog

  • [aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
  • [aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
  • [aot] C-API to get available archs (#6766) (by PENGUINLIONG)
  • [doc] Update sparse matrix document (#6719) (by pengyu)
  • [autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
  • [bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
  • [Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
  • [doc] Update dev install about clang version (#6759) (by Ailing)
  • [build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
  • [Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • [lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
  • [aot] JSON serde (#6754) (by PENGUINLIONG)
  • [bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
  • [Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
  • [lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
  • [llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
  • [Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
  • [build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
  • [aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
  • [ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
  • [doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
  • [misc] Fix warnings of taichi examples (#6740) (by PGZXB)
  • [example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
  • [build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)

v1.2.2

15 Nov 09:09
608e4b5
Compare
Choose a tag to compare

Molten-vk version is downgraded to v1.1.10 to fix a few GGUI issues.

Full changelog:

  • [build] Downgrade molten-vk version to v1.1.10 (#6564) (by Zhanlue Yang)

v1.2.1

01 Nov 06:31
12ab828
Compare
Choose a tag to compare

This is a bug fix release for v1.2.0.

Full changelog:

  • [mesh] Fix MeshTaichi warnings in CUDA backend (#6369) (by Chang Yu)
  • [Bug] Fix cache_loop_invariant_global_vars pass (#6462) (by Lin Jiang)

v1.2.0

25 Oct 09:56
f189fd7
Compare
Choose a tag to compare

Starting from the v1.2.0 release, Taichi follows semantic versioning where regular releases cutting from master branch bumps MINOR version and PATCH version is only bumped when cherry-picking critial bug fixes.

Deprecation Notice

Indexing multi-dimensional ti.ndrange() with a single loop index will be disallowed in future releases.

Highlights

New features

Offline Cache

We introduced the offline cache on CPU and CUDA backends in v1.1.0. In this release, we support this feature on other backends, including Vulkan, OpenGL, and Metal.

  • If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or offline_cache=False in the ti.init() method call and file an issue with us on Taichi's GitHub repo.
  • See Offline cache for more information.

GDAR (Global Data Access Rule)

A checker is provided for detecting potential violations of global data access rules.

  1. The checker only works in debug mode. To enable it, set debug=True when calling ti.init().
  2. Set validation=True when using ti.ad.Tape() to validate the kernels captured by ti.ad.Tape().
    If a violation occurs, the checker pinpoints the line of code breaking the rules.

For example:

import taichi as ti
ti.init(debug=True)

N = 5
x = ti.field(dtype=ti.f32, shape=N, needs_grad=True)
loss = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
b = ti.field(dtype=ti.f32, shape=(), needs_grad=True)

@ti.kernel
def func_1():
    for i in range(N):
        loss[None] += x[i] * b[None]

@ti.kernel
def func_2():
    b[None] += 100

b[None] = 10
with ti.ad.Tape(loss, validation=True):
    func_1()
    func_2()

"""
taichi.lang.exception.TaichiAssertionError:
(kernel=func_2_c78_0) Breaks the global data access rule. Snode S10 is overwritten unexpectedly.
File "across_kernel.py", line 16, in func_2:
    b[None] += 100
    ^^^^^^^^^^^^^^
"""

Improvements

Performance

Improved Vulkan performance with loops (#6072) (by Lin Jiang)

Python Frontend

  • PrefixSumExecutor is added to improve the performance of prefix-sum operations. The legacy prefix-sum function allocates auxiliary gpu buffers at every function call, which causes an obvious performance problem. The new PrefixSumExecutor is able to avoid allocating buffers again and again. For arrays with the same length, the PrefixSumExecutor only needs to be initialized once, then it is able to perform any number of times prefix-sum operations without redundant field allocations. The prefix-sum operation is only supported on CUDA backend currently. (#6132) (by Yu Zhang)

    Usage:

    N = 100
    arr0 = ti.field(dtype, N)
    arr1 = ti.field(dtype, N)
    arr2 = ti.field(dtype, N)
    arr3 = ti.field(dtype, N)
    arr4 = ti.field(dtype, N)
    
    # initialize arr0, arr1, arr2, arr3, arr4, ...
    # ...
    
    # Performing an inclusive in-place's parallel prefix sum,
    # only one executor is needed for a specified sorting length.
    executor = ti.algorithms.PrefixSumExecutor(N)
    executor.run(arr0)
    executor.run(arr1)
    executor.run(arr2)
    executor.run(arr3)
    executor.run(arr4)
    
  • Runtime integer overflow detection on addition, subtraction, multiplication and shift left operators on Vulkan, CPU and CUDA backends is now available when debug mode is on. To use overflow detection on Vulkan backend, you need to enable printing, and the overflow detection of 64-bit multiplication on Vulkan backend requires NVIDIA driver 510 or higher. (#6178) (#6279) (by Lin Jiang)

    For the following program:

    import taichi as ti
    
    ti.init(debug=True)
    
    @ti.kernel
    def add(a: ti.u64, b: ti.u64)->ti.u64:
        return a + b
    
    add(2 ** 63, 2 ** 63)
      The following warning is printed at runtime:
    Addition overflow detected in File "/home/lin/test/overflow.py", line 7, in add:
        return a + b
               ^^^^^
    
  • Printing is now supported on Vulkan backend on Unix/Windows platforms. To enable printing on vulkan backend, follow instructions at https://docs.taichi-lang.org/docs/master/debugging#applicable-backends (#6075) (by Ailing)

GGUI

Taichi Examples

Three new examples from community contributors are also merged in this release. They include:

  • Animating the fundamental solution of a Laplacian equation, (#6249) (by @bismarckkk)
  • Animating the Kerman vortex street using LBM, (#6249) (by @hietwl)
  • Animating the two streams of instability (#6249) (by JiaoLuhuai)

You can view these examples by running ti example in terminal and select the corresponding index.

Important bug fixes

  • "ti.data_oriented" class instance now correctly releases its allocated memory upon garbage collection. (#6256) (by Zhanlue Yang)
  • "ti.fields" can now be correctly indexed using non-i32 typed indices. (#6276) (by Zhanlue Yang)
  • "ti.select" and "ti.ifte" can now be printed correctly in Taichi Kernels. (#6297) (by Zhanlue Yang)
  • Before this release, setting u64 arguments with numbers greater than 2^63 raises error, and u64 return values are treated as i64 in Python (integers greater than 2^63 are returned as negative numbers). This release fixed those two bugs. (#6267) (#6364) (by Lin Jiang)
  • Taichi now raises an error when the number of the loop variables does not match the dimension of the ndrange for loop instead of malfunctioning. (#6360) (by Lin Jiang)
  • calling ti.append with vector/matrix now throws more proper error message. (#6322) (by Ailing)
  • Division on unsigned integers now works properly on LLVM backends. (#6128) (by Yi Xu)
  • Operator ">>=" now works properly. (#6153) (by Yi Xu)
  • Numpy int is now allowed for SNode shape setting. (#6211) (by Yi Xu)
  • Dimension check for GlobalPtrStmt is now aware of whether it is a cell access. (#6275) (by Yi Xu)
  • Before this release, Taichi autodiff may fail in cases where the condition of an if statement depends on the index of a outer for-loop. The bug has been fixed in this release. (#6207) (by Mingrui Zhang)

Full changelog:

  • [Error] Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by Lin Jiang)
  • Adjust aot_demo.sh (by jim19930609)
  • [error] Warn Linux users about manylinux2014 build on startup i(#6416) (by Proton)
  • [misc] Bug fix (by jim19930609)
  • [misc] Bump version (by jim19930609)
  • [vulkan] [bug] Stop using the buffer device address feature on macOS (#6415) (by Yi Xu)
  • [Lang] [bug] Allow filling a field with Expr (#6391) (by Yi Xu)
  • [misc] Rc v1.2.0 cherry-pick PR number 2 (#6384) (by Zhanlue Yang)
  • [misc] Revert PR 6360 (#6386) (by Zhanlue Yang)
  • [misc] Rc v1.2.0 c1 (#6380) (by Zhanlue Yang)
  • [bug] Fix potential bug in #6362 (#6363) (#6371) (by Zhanlue Yang)
  • [example] Add example "laplace equation" (#6302) (by 猫猫子Official)
  • [ci] Android Demo: leave Docker containers intact for debugging (#6357) (by Proton)
  • [autodiff] Skip gradient kernel compilation for validation kernel (#6356) (by Mingrui Zhang)
  • [autodiff] Move autodiff gdar checker to release (#6355) (by Mingrui Zhang)
  • [aot] Removed constraint on same-allocation copy (#6354) (by PENGUINLIONG)
  • [ci] Add new performance monitoring (#6349) (by Proton)
  • [dx12] Only use llvm to compile dx12. (#6339) (by Xiang Li)
  • [opengl] Fix with_opengl when TI_WITH_OPENGL is off (#6353) (by Ailing)
  • [Doc] Add instructions about running clang-tidy checks locally (by Ailing Zhang)
  • [build] Enable readability-redundant-member-init in clang-tidy check (by Ailing Zhang)
  • [build] Enable TI_WITH_VULKAN and TI_WITH_OPENGL for clang-tidy checks (by Ailing Zhang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [autodiff] Recover kernel autodiff mode after validation (#6265) (by Mingrui Zhang)
  • [test] Adjust rtol for sparse_linear_solver tests (#6352) (by Ailing)
  • [lang] MatrixType bug fix: Fix array indexing with MatrixType-index (#6323) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by Zhanlue Yang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [build] Enable google-explicit-constructor check in clang-tidy (by Ailing Zhang)
  • [build] Enable google-build-explicit-make-pair check in clang-tidy (by Ailing Zhang)
  • [build] Enable a few bugprone related rules in clang-tidy (by Ailing Zhang)
  • [build] Enable modernize-use-override in clang-tidy (by Ailing Zhang)
  • [ci] Use .clang-tidy for check_static_analyzer job (by Ailing Zhang)
  • [mesh] Support arm64 backend for MeshTaichi (#6329) (by Chang Yu)
  • [lang] Throw proper error message if calling ti.append with vector/matrix (#6322) (by Ailing)
  • [aot] Fixed buffer device address import (#6326) (by PENGUINLIONG)
  • [aot] Fixed export of get_instance_proc_addr (#6324) (by PENGUINLIONG)
  • [build] Allow building test when LLVM is off (#6327) (by Ailing)
  • [bug] Fix generating LLVM AOT module for the second time failed (#6311) (by PGZXB)
  • [aot] Per-parameter documentation in C-API header (#6317) (by **P...
Read more