v3.9.0
Improvements
- Add oneAPI backend #3296
- Add support to directly access arrays on other devices #3447
- Add asynchronous reduce all functions that return an af_array #3199
- Add broadcast support #2871
- Improve OpenCL CPU JIT performance #3257 #3392
- Optimize thread/block calculations of several kernels #3144
- Add support for fast math compiliation when building ArrayFire #3334 #3337
- Optimize performance of fftconvolve when using floats #3338
- Add support for CUDA 12.1 and 12.2
- Better handling of empty arrays #3398
- Better handling of memory in linear algebra functions in OpenCL #3423
- Better logging with JIT kernels #3468
- Optimize memory manager/JIT interactions for small number of buffers #3468
- Documentation improvements #3485
- Optimize reorder function #3488
Fixes
- Improve Errors when creating OpenCL contexts from devices #3257
- Improvements to vcpkg builds #3376 #3476
- Fix reduce by key when nan's are present #3261
- Fix error in convolve where the ndims parameter was forced to be equal to 2 #3277
- Make constructors that accept dim_t to be explicit to avoid invalid conversions #3259
- Fix error in randu when compiling against clang 14 #3333
- Fix bug in OpenCL linear algebra functions #3398
- Fix bug with thread local variables when device was changed #3420 #3421
- Fix bug in qr related to uninitialized memory #3422
- Fix bug in shift where the array had an empty middle dimension #3488
Contributions
Special thanks to our contributors:
Willy Born
Mike Mullen