You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version is now pulled from git tags instead of from VERSION file
dxchange is no longer a required dependency
CMake build system to handle addition of code in two new languages: C++ and CUDA
Python bindings to C++/CUDA code still go through C interface (i.e. no direct binding)
SIRT and MLEM have been implemented on the GPU and CPU using rotation-based algorithm
GPU support has been validated for Windows and Linux
CPU version uses OpenCV for rotation
OpenCV distributed via conda + MinGW on Windows does not work. Use MSVC compiler on Windows.
GPU version uses NPP for rotation
benchmarking on NVIDIA P100: ~11x slower than gridrec but vastly improved reconstruction quality
benchmarking on NVIDIA V100: per-slice speed-up over ray-based algorithm is ~650x, e.g. a TomoBank reconstruction (2048p + 1,500 proj angles) formerly requiring ~6.5 hours is completed in ~40 seconds
Support for Microsoft Visual C++ (MSVC) compiler
Implemented gridrec in C++ (uses std::complex) which is enabled by default on Windows
To enable new algorithms, include accelerated=True to tomopy.recon for SIRT and MLEM
there are other options available but unless there is an explicitly understanding the effects of the other parameters, use the defaults.
Multi-GPU support is available
Automatic detection of number of available devices
Multiple threads started at Python level automatically spread out over the number of available GPUs
Secondary thread-pools created in C++ code to provide highly efficient communication with the GPU and additional parallelism on the CPU.
When running on the GPU, set ncore parameter to tomopy.recon to the number of GPUs available.
Each "Python" thread creates a unique secondary thread-pool with a default size of 2 * number-of-cpus. This is intentional and, in general, the larger the secondary thread-pool, the more efficiently the CPU-GPU communication latency is hidden. However, in general, more than 24 threads per thread-pool provides no benefit (all latency is essentially hidden at that point)