Skip to content

Releases: halide/Halide

Halide 13.0.0

02 Nov 17:24
c3641b6
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 13.0.0!

This is a major release. Most notably, Halide now requires C++17 (or higher).

You can download one of our binary releases here, or check one of the following package repositories (they might take some time to be updated):

Language and Compiler

  • The compiler now requires C++17 or higher. (#5282)
  • Overloads of realize() that were deprecated in Halide 12 are now removed. (#6122, #6162)
  • Added new predicated tail strategies for split loops. (#6126)
  • Added a more fine-grained prefetch directive. (#6155)
  • Compiler now always runs in a separate 32MB stack on all platforms. (#6239)
  • Fixed a semantics bug where data-dependent loads might be uninitialized on over-compute. (#6294)
  • Using MemoryType::Stack may now trigger a real stack allocation for dynamically-sized allocations discovered to be small at runtime (#6289)

Backends

  • Simplifier improvements saw a >10% reduction in peak memory usage in many apps, including camera_pipe, harris, nl_means, and stencil_chain. (#6174)
  • The ARM backend now supports native 16-bit float instructions (#6102)
  • Division by non-power-of-two unsigned constants is now faster on X86 (#6322)
  • The WebAssembly backend is mature enough for significant production use (See https://web.dev/ps-on-the-web/)

Build

  • Fixed an issue with add_halide_library on Xcode, which requires at least one source file for every target. (#6175)
  • Added a watchdog timer to the Halide generator executables (i.e. GenGen.cpp). (#6184, #6240)
  • Fixed a missing dependency on Threads::Threads in CMake (#6257)
  • The tutorials and readmes are now packaged to the doc dir. The documentation has been moved one level deeper to share/doc/Halide/html (#6267)

Halide 12.0.1

21 May 03:38
Compare
Choose a tag to compare

This is a hotfix for v12.0.0

Bugs fixed:

  • Don't emit aligned loads to unaligned addresses in certain strided scenarios. #6046 #6047

Halide 12.0.0

20 May 09:11
b5a34c3
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 12.0.0!

This is mostly a quality of life and bugfix release to set the stage for larger changes in Halide 13 (which will require C++17).

You can download one of our binary releases here, or check one of the following package repositories:

Language and Compiler

  • Added align_extent scheduling directive #5829
  • Added TailStrategy::Predicate as an alternative to TailStrategy::GuardWithIf to use predicated loops unconditionally #5856
  • Added scatter() and gather() expressions to support reading from and writing to multiple locations in update definitions #5553
  • Added internal memoization to Adams2019 autoscheduler (performance improvement) #5697 #5654
  • Removed old-style realize() methods which had been deprecated #5676
  • Removed deprecated scheduling directive overloads #5656
  • Many simplifier and bounds inference improvements and bugfixes #5615 #5618 #5895 #6002

Backends

  • Added support for AVX512 VNNI instructions #5725 #5807
  • Removed OpenGL/GLSL backend #5626
  • Fixed various errors with large_buffers #5716 #5940
  • Improved support for sdot and udot instructions on ARM (where supported) #5954
  • Improved support for WebAssembly SIMD ops, when compiling with LLVM 13 #5849 #5850 #5853 #5854 #5861 #5863
  • PyStub generators must now choose to use either only positional arguments or only keyword arguments. This is an ABI break #5761

Build

  • Added scripts to create Ubuntu packages #5754 #5967
  • Added experimental support for ClangCL on Windows #5876
  • Added support and pre-built binaries for macOS ARM64
  • Halide headers no longer inject stack space linker flags on Windows; now, the compiler runs on a fiber with enough stack space #5873
  • Halide shared library no longer exposes LLVM symbols on macOS and Linux. Help wanted for Windows! #5659

Halide 11.0.1

19 Feb 04:29
Compare
Choose a tag to compare

This is a small bugfix release over Halide 11.0.0.

Build

  • Fixes build failure with disabling Hexagon. #5745
  • Fixes dependence on LLVM having ARM, AArch64 backends. #5745

Halide 11.0.0

15 Feb 23:18
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 11.0.0!

This release comes with many backend improvements and some notable deprecations. HVX 64 support has been removed, and OpenGL support has been deprecated (and has been removed from upstream).

You can download one of our binary releases here, or check one of the following package repositories:

Language and Compiler

  • Scheduling
    • The memoize directive gained a new EvictionKey parameter to schedule removal of particular entries from the cache. #5510
    • Added support for multi-dimensional vectorization #4873
  • Bounds inference
    • Left shifts could have incorrect bounds #5477
    • Analysis of comparisons (<,<=) and max/min could have incorrect bounds #5438
    • Integer division analysis was improved #5407
  • Various bugfixes
    • An integer-sign bug in lossless_cast was fixed #5459

Backends

  • ARM64 Windows is now supported, along with Direct3D 12. #5544
  • OpenGL (not OpenGL Compute) has been deprecated in this release and will be removed in Halide 12. You will see deprecation messages during your builds. #5475 #5551
    • We still welcome PRs to release/11.x from users who cannot move off the OpenGL backend.
    • Several bugs with EGL and OpenGL ES were fixed. #5730 #5619
    • Several bugs with plain OpenGL were fixed. #5545
  • CUDA
    • A bug with warp shuffles with narrow types was fixed #5624 #5669
  • Metal
    • Thread limits are now checked correctly #5588
  • Hexagon
    • Several bugs were fixed in #5570
    • Gained support for saturating vdmpy and vtmpy instructions #5424
    • Removed support for HVX_64 #5365 #3925

Build

  • Dependencies
    • Upgraded pybind11 dependency to 2.6.1 #5644
  • CMake
    • CMake rules learned about ppc64le targets #5558
    • CMake presets are available for users on 3.19+ #5508
    • add_halide_library gained a NAMESPACE argument to improve readability when using C++ name mangling. #5467
  • Bugfixes
    • Corrected Makefile warnings that MinGW is not supported. #5580
    • Incorrect system headers on FreeBSD/powerpc64 were replaced #5572
    • Emit error when trying to link to static LLVM, but lldWasm was linked to shared LLVM. #5472
    • CMake rules fixed for i686 systems #5675

Halide 10.0.1

15 Feb 22:23
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 10.0.1!

The main change is that LLVM 10.0.1 is now the bundled version (it had previously been 10.0.0).

  • Fixed target detection for i686 in CMake #5675
  • Upgraded pybind11 to 2.6.1 #5644
  • Fixed missing newline bug in OpenCL backend #5277
  • Improved performance in Direct32 12 backend #5293 #5298
  • Fixed minor bug in loop partitioning #5355
  • Fixed linking to shared LLVM from CMake #5308
  • Fixed imprecisions in bounds inference for integer div and mod #5331 #5350
  • Fixed various issues in documentation #5330

Halide 10.0.0

16 Sep 20:17
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 10.0.0!

This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.

What happened to version 9?

For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).

Autoschedulers

  • There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
  • The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
  • The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.

Build

  • The CMake build has been rewritten. See README_cmake.md for details.
  • The minimum CMake version is now 3.16
  • The old halide.cmake module has been removed in favor of find_package(Halide).
  • We no longer support the MinGW toolchain.

Language features

  • The atomic scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not).
  • Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the vectorize and atomic directives
  • Integer division or mod by zero now returns zero instead of being undefined behavior.
  • The simplifier is now formally verified.
  • You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
  • Allocation size inference is more precise in a variety of cases.
  • Various bugfixes for compute_with.

Backends and targets

  • Better Direct3D 12 support
  • Added support for macOS and Windows on ARM.
  • We no longer support the legacy buffer_t type.
  • Explicit support for Volta, Turing, Ampere GPUs

Halide 8.0.0

27 Aug 18:24
65c26cb
Compare
Choose a tag to compare

New features since last release include:

  • Generate custom pytorch ops from Halide pipelines
  • Automatic differentiation of Halide pipelines
  • A Webassembly backend
  • A Direct3D backend
  • An opt-in caching allocator for cuda to reduce the amount of time spent in cuMemAlloc, cuMemFree
  • float16 and bfloat16 support
  • Faster compilation of very large pipelines
  • New ways to assert properties of arguments, including unchecked assertions, and more aggressive simplifications that exploit these
  • The ability to place Funcs in stack/heap/shared/register memory explicitly with store_in
  • Runtime configuration of Generator inputs/outputs
  • Support for DMA transfers on Hexagon
  • Generate python extension modules from Halide pipelines
  • Lower overhead when calling realize repeatedly on small pipelines
  • Optional strict floating point semantics for single expressions or entire pipelines
  • Producer-consumer task parallelism with Func::async
  • Numerous improvements to Halide::Runtime::Buffer. Consider replacing your custom halide_buffer_t wrappers with it.
  • Many many more small improvements and bug fixes (it has been a while since our last release)

Edit: This release was renamed to use the included llvm version instead of the date. It was formerly named Halide 2019/08/27

Halide 2018/02/15

15 Feb 19:42
46d8e9e
Compare
Choose a tag to compare

You probably want halide-linux-64-trunk, halide-mac-64-trunk or halide-win-distro-64-trunk for linux, os x, and windows respectively. For linux, pay attention to the various gcc versions and download the one that matches your compiler version. You may get linker errors if you download the wrong one.

Notable changes include:

  • Scheduling:
    • New scheduling directive: compute_with
  • Codegen:
    • Better instruction selection for Hexagon
    • Less integer math in cuda kernels
    • Support for warp shuffle instructions on cuda
    • Support for MSAN in Clang
    • X86 Runtime: various AVX2 improvements
  • Fixes:
    • Buffer now uses halide_device_crop API from within the Buffer class instead of just discarding any device allocation when a Buffer is cropped
    • Auto-scheduler: unbounded function bugs
    • halide_print() now defaults to output to stdout rather than stderr
    • Various fixes to corner cases of Buffer<> with const types
  • API:
    • Completely rewrote Python bindings using PyBind11 (not yet complete but much more robust and well-supported)
    • Removed long-deprecated variants of gpu_tile()
    • Added IRMutator2, deprecated IRMutator
  • Apps:
    • replaced apps/hexagon_matmul with apps/nn_ops, which provides fast implementations of common
      deep learning network operations on all platforms that Halide supports
  • Generators:
    • Revised LoopLevel to allow deferred-evaluation, making it easier to compose separate pieces of Halide code (e.g. when the compute_at or store_at may not be known yet)
    • remove Generator::ScheduleParam entirely; added support for GeneratorParam instead
    • Simplified Stubs to no longer be stateful, but just a single "generate" method
  • Build:
    • All prebuilt libHalide versions (both static and dynamic) are now built with RTTI enabled (previously they were built with RTTI disabled)
    • Much better CMake support, including 'make distrib', 'make install', and better test targets
    • Drop support for LLVM 3.9

Halide 2013/11/11

11 Nov 18:08
Compare
Choose a tag to compare

Trunk Halide as of 2013/11/11, precompiled against trunk llvm and pnacl's llvm. Use the pnacl version only if you want to use the native client targets.