Skip to content

Releases: romeric/Fastor

Fastor V0.6.4

04 Jun 05:13
Compare
Choose a tag to compare

Fastor V0.6.4 is an incremental change over V0.6.3. This release includes bug fixes and some new features

  1. Sleef backend for SIMD implementation of trigonometric and hyperbolic functions
  2. New tensor functions: squeeze, reshape and flatten [d8acd9f]. Refer to Wiki page for the documentation
  3. More general support for complex-valued arithmetic and complex-valued tensor algebra such lu, solve etc [c03811b]
  4. Add top level CMake file for distribution of Fastor [8bc161e]. Contributed by @mablanchard
  5. Fix compile issue with type name printing [13b2a1c]. Contributed by @matthiasneuner
  6. Fix bug in determinant [e96e63f]. Contributed by @wermos
  7. Implement Singular-Value-Decomposition svd and Signed Singular-Value-Decomposition ssvd for small square matrices [6de6662]
  8. Fix multiple bugs with TensorMaps [5980b41]

Fastor V0.6.3

07 Jun 02:09
Compare
Choose a tag to compare

Fastor V0.6.3 is another incremental change over V0.6 release that introduced a significant overhaul in Fastor's internal design and exposed API. This release includes mainly internal changes

New features and improvements

  1. Continuous integration support via Travis CI on Linux. We test against GCC-5 to GCC-latest and default Clang using both scalar and SIMD implementations
  2. Continuous integration support via Appveyor CI for MSVC builds on Windows. We test against Visual Studio 2019 under debug for now. Our test cases take excessively long under release and eventually time out although they build fine
  3. Unit test are now built using CMake instead of raw Makefiles
  4. lut_inverse and ut_inverse have been renamed to tinverse taking UpLoType similar to linear algebra computation types #87
  5. Single tensor expression einsum for inner and permuted inner product of a single tensor expression #80
  6. Explicit einsum by allowing the user to specify the shape of the tensor contraction output. einsum now can permute and can deal with inner and permuted inner products of tensors and tensor expressions #91
  7. A new permute function that closely resembles NumPy's permute option and implements contiguous writes (instead of contiguous reads) which results in about 15-20% performance improvement. This function is not identical to permutation
  8. All remaining mathematical functions are now implemented - cbrt, exp2/expm1, log10/log2/log1p, asinh/acosh/atanh/atan2, erf/lgamma/tgamma/hypot, round/floor/ceil, min/max etc. Where applicable SIMD versions of these are implemented. The SIMD math layer has been cleaned up and reworked
  9. Element-wise unary boolean operators !(Expression), isinf(Expression), isnan(Expression) and isfinite(Expression) are implemented #90
  10. Element-wise binary math function min(a,b)/max(a,b), pow/hypot/atan2 are now available
  11. Fastor now uses alignas instead of compiler specific macros for memory alignment #98
  12. Fastor specific and user controllable macros are now moved to config.h and macros.h under config folder previously named commons #58

Bug fixes

  1. Bug fix in expression binding policy that resulted in segfaults #95
  2. Fix for assigning cv-qualified TensorMap to Tensor #94 by @feltech
  3. Detect the correct SIMD type for cv-qualified tensors #99
  4. Bug fix for nested boolean expressions such as !isfinite(Expression) or !(a>b) #93
  5. Fix overflow in boolean views #100
  6. Fix detecting the correct language standard under MSVC 7592ea7
  7. Fix regression in abstract permutation #96

Fastor V0.6.2

18 May 17:07
Compare
Choose a tag to compare

Fastor V0.6.2 is another incremental change over V0.6 release that introduced a significant overhaul in Fastor's internal design and exposed API. This release includes

  1. SIMD support for complex numbers and complex valued arithmetics starting from SSE2 all the way to AVX512. The SIMD implementation for complex numbers is written with optimisation and specifically FMA in mind and it delivers performance similar to Intel's MKL JIT for complex matrix-matrix multiplication and so on. Comprehensive unittests are added for SIMD complex valued arithmetics
  2. conj function introduced for computing the conjugate of a complex valued tensor expression
  3. arg function introduced for computing the argument or phase angle of a complex valued tensor expression
  4. ctranspose and ctrans functions introduced for computing the conjugate transpose a complex valued tensor expression
  5. All boolean tensor methods such as isequal, issymmetric etc are now implemented as free functions working on tensor expressions instead of tensors. There is no longer an underscore in the name of these functions that is is_equal method of the tensor is now transformed to isequal working on expressisons
  6. Performance optimisations for creating tensors of tensors (such as Tensor<Tensor<double,3,3>,2,2>) or tensors of any non-primitive types (such as Tensor<std::vector<double>,2,2>). matmul and tmatmul functions have been specifically tuned to work well with such composite types
  7. Fix an issue in tmatmul that was causing compilation error on Windows with MSVC 2019

Fastor V0.6.1

07 May 21:26
Compare
Choose a tag to compare

Fastor V0.6.1 is an incremental change over V0.6 release that introduced a significant overhaul in Fastor's internal design and exposed API. This release includes

  1. lu function introduced for LU decomposition of 2D tensors. Multiple variants of LU decomposition is available including no pivoting, partial pivoting with a permutation vector and partial pivoting with a permutation matrix. This is perhaps the most performant implementation of the LU decomposition available today for small matrices of up to 64x64. If no pivoting is used it the performance is unbeaten for all sizes up to the stack limit however given that the implementation is based on compile time loop recursion for sizes up to 32x32 and further it uses block recursion which in turn uses block-triangular-inversion compilation would be quite time consuming for bigger sizes
  2. ut_inverse and lut_inverse for fast triangular inversion of upper and unit lower matrices using block-wise inversion
  3. tmatmul function equivalent to BLAS's TRMM function for triangular matrix-matrix (or vector) multiplication which allows either or both operand to be upper/lower triangular. The function can be used to specifiy which matrix is lower/upper at compile time like tmatmul<matrix_type::lower_tri,matrix_type::general>(A,B). Proper 2X speed up over matmul for when one operand is triangular and 4X when both are triangular can be achieved for bigger sizes
  4. det/determinant can now be computed for all sizes using the LU decomposition [default for matrix sizes bigger than 4x4].
  5. inv/inverse and solve can be performed with any variant of the LU decomposition
  6. There is now a unified interface for choosing the computation type of linear algebra functions for instance det<DetCompType::BlockLU>(A) or inv<InvCompType::SimpleLUPiv>(A) or solve<SolveCompType::BlockLUPiv> etc
  7. tril/triu functions added for getting the lower/upper part of a 2D tensor
  8. Comprehensive unit tests and benchmarks are added and are available for these newly added (and some old) routines

Fastor V0.6

01 May 18:21
Compare
Choose a tag to compare

Fastor V0.6 is a major release that brings a lot fundamental internal redesign and performance improvements. This is perhaps the biggest release since the inception of project Fastor. The following are a list of changes and the new features released in this version

  1. The whole of Fastor's expression template engine has been reworked to facilitate arbitrary re-structuring of the expressions. Most users will not notice this change as it pertains to internal re-architecturing but the change is quite significant. The main driver for this has been to introduce and chaing linear algebra expressions with other element-wise operations.
  2. A series of linear algebra expressions are introduced as a result with less verbose names and the other existing linear algebra routines are now moved to a dedicated linear algebra expression module. This lays out the basic building blocks of Fastor's tensor algebra library
  3. Multiplication operator % introduced that evaluate lazy and takes any expression
  4. Greedy like matmul implemented. Operations like A%B%C%D%... will be evaluated in the most efficient order
  5. inv function introduced for lazy inversion. Extremely fast matrix inversion up to stack size 256x256
  6. trans function introduced for lazy transpose. Extremely fast AVX512 8x8 double and 16x16 float transpose using explicit SIMD introduced
  7. det function introduced for lazy determinant
  8. solve function evaluated for lazy solve. solve has the behaviour that if both the inputs are Tensor it evaluates immedidately and if either one of the inputs is an expressions it delays the evaluation. solve is now also able to solve matrices for up to stack size 256x256
  9. qr function introduced for QR factorisation using modified Gram-Schmidt factorisation that has the potential to be easily SIMD vectorised in the future. The scalar implementation at the moment has good performance
  10. absdet and logdet functions introduced for lazy computation of absolute and natural logarithm of a determinant
  11. determinant, matmul, transpose and most verbose linear algebra functions can now take expressions but evaluate immediately
  12. einsum, contraction, inner, outer, permutation, cross, sum and product, now all work on expressions. einsum/contraction for expressions also dispatches to the same operation minimisation algorithms that the non-expression version does hence the above set of new functions are as fast for expressions as they are for tensor types. cross function for cross product of vectors is introduced as well
  13. Most linear algebra operations like qr, det, solve take optional parameters (class enums) to request the type of computation for instance det<DetCompType::Simple>, qr<QRCompType::MGSR> etc
  14. MKL (JIT) backend introduced which can be used in the same way as libxsmm
  15. The backend _matmul routines are reworked and specifically tuned for AVX512 and _matmul_mk_smalln is cleaned up and uniformed for up to 5::SIMDVector::Size. Most matmul routines are now available at SSE2 level when it makes sense. matmul is now as fast as the dedicated MKL JIT API
  16. AVX512 SIMDVector for int32_t and int64_t introduced. SIMDVector for int32_t and int64_t are now activated at SSE2 level as well
  17. Most intrinsics are now activated at SSE2 level
  18. All views are now reworked so there is now no need for FASTOR_USE_VECTORISED_EXPR_ASSIGN macro unless one wants to vectorise strided views
  19. Multi-dimensional TensorFixedViews introduced. This makes it possible to create arbitrary dimensional tensor views with compile time deducible sizes. This together with dynamic views complete the whole view expressions of Fastor
  20. diag function introduced for viewing the diagonal elements of 2D tensors and works just like other views in that it can appear on either side of an equation (can be assigned to)
  21. Major bug fix for in-place division of all expressions by integral numbers
  22. A lot of new features, traits and internal development tools have been added.
  23. As a result Fastor now requires a C++14 supporting compiler

The next few releases from here on will be incremental and will focus on ironing out corner cases while new features will be continuously rolled out.

Fastor V0.5.1

08 Apr 01:51
Compare
Choose a tag to compare

Although with a minor tag Fastor V0.5.1 includes some major changes specially in the API design, performance and stability

  1. SIMDVector has been reworked to fix the long-standing issue with fall-back to non SIMD code for non-64 bit types. The fall-back is now always to the correct scalar type where a scalar specialisation is available i.e. float, double, int32_t, int64_tand to a fixed array of size 1 holding the type for other cases. The API is now a lot closer to Vc and std::experimental::simd. SIMDVector for floating points is now also activated at SSE2 level allowing any compiler that automatically defines SSE2 without -march=native vectorise Fastor's code since all compiler these days define __SSE2__ at -O2/-O3 levels
  2. Fix a long-standing bug in network tensor contraction. Rework opmin_meta/cost models to be truly compile-time recursive in terms of depth first search. Strided contractions for networks have completely been removed and for pairs it is deactivated. Tensor contraction of networks now dispatches to by-pair einsum which has many specialisation including dispatching to matmul. More than an order of magninute performance gain in certain cases.
  3. Extremely fast matmul/gemm routines. Fastor now provides potentially the fastest gemm routine for small to medium sized tensors of single and double precision as far as static dispatch is concerned. Benchmarks have been added here. Many flavours of matmul implementations are now available, for different sizes and with remainder handling and mask loading/storing.
  4. AVX512 support for single and double floats
  5. Better macro handling through a series of new FASTOR_... macros
  6. Accurate timeit function based on rdtsc together with memory clobber and serialisation for further accuracy
  7. Fastor is now Windows compatible. The whole test suite runs and passes on MSVC 2019
  8. Quite a few bugs and compiler warnings have been fixed along the way

Fastor V0.5

20 Mar 15:21
Compare
Choose a tag to compare

Fastor V0.5 is one hell of a release as it brings a lot of new features, fundamental performance improvements, improved flexibility working with Tensors and many bug fixes:

New Features

  1. Improved IO formatting. Flexible, configurable formatting for all derived tensor classes
  2. Generic matmul function for AbstractTensors and expressions
  3. Introduce a new Tensor type SingleValueTensor for tensors of any size and dimension that have all their values the same. It is extremely space efficient as it stores a single value under the hood. It provides a more optimised route for certain linear algebra functions. For instance matmul of a Tensor and SingleValueTensor is O(n) and transpose is O(1)
  4. New evaluation methods for all expressions teval and teval_s that provide fast evaluation of higher order tensors
  5. cast method to cast a tensor to a tensor of different data type
  6. get_mem_index and get_flat_index to generalise indexing across all tensor classes. Eval methods now use these
  7. Binary comparison operators for expressions that evaluate lazy. Also binary comparison operators for SIMDVectors
  8. Constructing column major tensors is now supported by using Tensor(external_data,ColumnMajor)
  9. tocolumnmajor and torowmajor free functions
  10. all_of, any_of and none_of free function reducers that work boolean expressions
  11. Fixed views now support noalias feature
  12. FASTOR_IF_CONSTEXPR macro for C++17

Performance and other key improvements

  1. Tensor class can now be treated as a compile time type as it can be initialised as constexpr by defining the macro FASTOR_ZERO_INITIALISE
  2. Higher order einsum functions now dispatch to matmul whenever possible which is much faster
  3. Much faster generic permutation, contraction and einsum algorithms that definitely beat the speed of hand-written C-code now based on recursive templates. CONTRACT_OPT is no longer necessary
  4. A much faster loop tiling based transpose function. It is at least 2X faster than implementations in other ET libraries
  5. Introducing libxsmm backend for matmul. The switch from in-built to libxsmm routines for matmul can be configured by the user using BLAS_SWITCH_MATRIX_SIZE_S for square matrices and BLAS_SWITCH_MATRIX_SIZE_NS for non-square matrices. Default sizes are 16 and 13 respectively. libxsmm brings substantial improvement for bigger size matrices
  6. Condensed unary ops and binary ops into a single more maintainable macro
  7. FASTOR_ASSERT is now a macro to assert which optimises better at release
  8. Optimised determinant for 4x4 cases. Determinant now works on all types and not just float and double
  9. all is now an alias to fall which means many tensor view expressions can now be dispatched to tensor fixed views. The implication of this is that expressions like a(all) and A(all,all) can just return the underlying tensor as opposed to creating a view with unnecessary sequences and offsets. This is much faster
  10. Specialised constructors for many view types that construct the tensor much faster
  11. Improved support for TensorMap class to behave exactly the same as Tensor class including views, block indexing and so on
  12. Improved unit-testing under many configurations (debug and release)
  13. Many Tensor related methods and functionalities have been separated in to separate files that are now usable by other tensor type classes
  14. Division of an expression by a scalar can now be dispatched to multiplication which creates the opportunity for FMA
  15. Cofactor and adjoint can now fall back to a scalar version when SIMD types are not available
  16. Documentation is now available under Wiki pages

Bug fixes

  1. Fix a bug in product method of Tensor class (99e3ff0)
  2. Fix AVX store bug in backend matmul 3k3 (8f4c6ae)
  3. Fix bug in tensor matmul for matrix-vector case (899c6c0)
  4. Fix a bug in SIMDVector under scalar mode with mixed types (f707070)
  5. Fix bugs with math functions on SIMDVector with size>256 not compiling (ca2c74d)
  6. Fix bugs with matrix-vector einsum (8241ac8, 70838d2)
  7. Fix a bug with strided_contraction when the second matrix disappears (4ff2ea0)
  8. Fix a bug in 4D tensor initializer_list constructor (901d8b1)
  9. Fixes to fully support SIMDVector fallback to scalar version
  10. and many more undocumented fixes

Key changes

  1. Complete re-architecturing the directory hierarchy of Fastor. Fastor should now be included as #include <Fastor/Fastor.h>
  2. TensorRef class has now been renamed to TensorMap
  3. Expressions now evaluate based on the type of their underlying derived classes rather than the tensor that they are getting assigned to

There is a lot more major and minor undocumented changes.

Fastor V0.4

12 Jun 19:39
Compare
Choose a tag to compare

This release brings new features, improvements and bug fixes to Fastor:

  1. Lots of changes to support MSVC. Thanks to @FabienPean.
  2. Permutation and einsum functions for generic tensor expressions.
  3. A TensorRef class that wraps over existing data and exposes Fastor's functionality over raw data.
  4. Some more tensor functions can work on tensor expressions.
  5. Linear algebra functions for high order tensors operating on the last two indices (similar to how NumPy operates).
  6. More variants of tensor cross product are now available for high order tensors.
  7. Bug fixes in backend trace and transpose.
  8. Bug fix in ordering of tensor networks.
  9. Bug fix in computing cost models.

and much more!

Fastor V0.3.1

02 May 01:22
Compare
Choose a tag to compare

Bug fix release.

Fastor V0.3

17 Apr 04:52
Compare
Choose a tag to compare

This release brings lots fundamental new features, performance improvements and bug fixes to Fastor, in particular

  1. Tensor views provide the ability to index, slice and broadcast multi-dimensional tensors using ranges/sequences/other tensors much like NumPy/MATLAB arrays. See the documentation.
  2. The evaluation mechanism in Fastor so far used static_casting for chaining operations in the corresponding eval functions. This used to generate a lot of unnecessary type conversion code. Starting from V0.3 the eval functions are well-informed leading to faster and much cleaner code and helping the compiler optimise much more.
  3. Support for FMA. The matmul, norm and inner functions and multiple other tensor overloads now use FMA instructions when available.
  4. Support for norm, inner, sum and product functions for any type of expressions.
  5. Bug fix in generic transpose and 2D SP transpose methods.
  6. Code splitting and plugins for cleaner maintainable code base.
  7. Division instructions can safely be dispatched to multiplication while hoisting the reciprocal out of the loop for expressions of type Expr / Scalar.
  8. FASTOR_FAST_MATH and FASTOR_UNSAFE_MATH are introduced. The FASTOR_UNSAFE_MATH flag turns Expr / Scalar expressions to approximate reciprocal and multiplication intrinsics, which can harm the accuracy. FASTOR_FAST_MATH is just a place holder macro activated by default under -Ofast.
  9. Lots of new test cases introduced.
  10. New benchmark problems for views and finite difference introduced.
  11. scalar_type was not correctly implemented for expressions. Now fixed.
  12. Equal rank tensor assignment restriction is now relaxed in order for expressions and views of any rank to be assigned to expressions of a different rank, as long as their size (capacity) is equal.
  13. Many functions are decorated inline and constexpr. This helps the compiler generate very compact code and aggressively eliminate dead code.
  14. Low and high rank tensors can be created using brace initialisers.
  15. Fix the SP/DP bug in matmul.
  16. Introduce the now very recommended -DNDEBUG flags to most Makefiles.
  17. Lots of other minor improvements and bug fixes.

As a final note, while compiling views mixed with other complex expression it is really beneficial to add the inlining flags to the compiler, such as -finline-limit=n for GCC, -mllvm -inline-threshold=n for Clang and -inline-forceinline -inline-factor=n for ICC, and although overlapping assignments are provided for convenience, it helps the compiler a lot in inlining if -DFASTOR_NO_ALIAS is issused. Also for 1D and 2D views -FASTOR_USE_VECTORISED_ASSIGN can cut down runtimes by a factor of 2-4, if compiler is successful at inlining.