Releases · esa-tu-darmstadt/spn-compiler

19 Nov 22:37

github-actions

v0.2.0

acab827

v0.2.0 Latest

Latest

Second major release based on MLIR, now completely based on LLVM/MLIR release 13.

Compared to v0.1.0, a number of major additions have been introduced:

Low-latency inference

Thanks to contributions from @csvtuda, SPNC now comes with its own SLP (superword-level parallelism) vectorizer, based on MLIR and specialized for SPNs. Instead of vectorizing across samples in a batch, the SLP vectorizer will try to vectorize the single evaluation of the SPN for low latency inference. The SLP vectorizer is active when choosing cpuVectorize=True and batchSize=1 during compilation for the CPU. Evaluation has showed that improvements in latency by up to 42x over unvectorized code and up to 7x over the LLVM SLP vectorizer can be achieved.

Note: SLP vectorization is an elaborate process. While the evaluation has showed that the SPN-specific SLP vectorizer in SPNC is typically faster than the LLVM SLP vectorizer and in many cases even faster than unvectorized compilation, you may encounter longer compilation times with SLP vectorization for large SPNs. In such cases, deactive the SLP vectorization by setting cpuVectorize=False.

Graph Partitioning

To avoid excessive compilation times for very large SPNs, SPNC now supports partitioning of the SPN DAG into independent pieces for all targets. The size of the individual partitions (i.e., the number of operations) can be controlled through maxTaskSize. According to a first evaluation, a default of 10,000 is a sensible default for this value, but you can use this knob to control compilation time and resulting performance. Use -1 to disable partitioning all together.

Supported architectures

In this release, SPNC has gained vectorization support for the ARM Neon architecture and now supports AVX, AVX2, AVX-512 and Arm Neon as target architectures for vectorization. The ARM Optimized Routines are used for fast math primitives on ARM Neon.

The GPU support was also improved and SPNC now avoids unnecessary copies between host and GPU when graph partitioning is active. SPNC now also supports CUDA unified memory on devices where CPU and GPU share the same physical memory, e.g., the Nvidia Jetson Family. Support for unified memory can be enabled through the CMake option CUDA_UNIFIED_MEMORY during build.

Other Improvements

A number of minor bugs were also fixed in this release. The internal representation and construction of compilation pipelines has been redesigned, causing SPNC to require significantly less memory for compilation.

Binaries

The release comes with some pre-built Python wheels to facilitate installation of SPNC:

xspn-0.2.0-py3-none-any.whl: The serialization library used by the compiler (can also be used standalone, see README). This wheel is platform-agnostic.
spnc-0.2.0-py3-none-linux_x86_64.whl: The compiler itself. This version only supports inference on CPUs and should be usable on any common Linux platform.
spnc_gpu-0.2.0-py3-none-linux_x86_64.whl: The compiler itself. This version supports inference on CPUs and CUDA GPUs and should be usable on any Linux platform with CUDA 11.2 and the CUDA driver installed.

For more installation options, see the Installation Manual.

Contributors

csvtuda

Assets 5

11 Jun 12:15

github-actions

weekly

acab827

Weekly Development Build Pre-release

Pre-release

weekly

Merge branch 'release/v0.2.0'

Assets 4

04 May 16:37

github-actions

v0.1.0

abd2969

v0.1.0

First release completely based on MLIR.

This release supports log-likelihood inference on the following architectures:

CPUs: All CPU architectures supported by LLVM. Vectorization is currently only available on X86 architectures.
GPUs: CUDA GPUs.

The currently supported SPFlow leave nodes are Gaussian, Categorical and Histogram.

This version also comes with a convenient Python interface to the compiler, which directly integrates with SPFlow, see the usage example in the README on how to use this interface.

The release comes with some pre-built Python wheels to facilitate installation of SPNC:

xspn-0.1-py3-none-any.whl: The serialization library used by the compiler (can also be used standalone, see README). This wheel is platform-agnostic.
spnc-0.1-py3-none-linux_x86_64.whl: The compiler itself. This version only supports inference on CPUs and should be usable on any common Linux platform.
spnc_gpu-0.1-py3-none-linux_x86_64.whl: The compiler itself. This version supports inference on CPUs and CUDA GPUs and should be usable on any Linux platform with CUDA 11.2 and the CUDA driver installed.

For more installation options, see the Installation Manual.

Assets 5