Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1.5.x - GPU support, rotation-based recon, MSVC support (#404)
* TiMemory + slight opt for CUDA * Fully working CUDA SIRT * Removed old CUDA code + cleanup * run_compare.sh script * MT optimizations + build fixes * Fixed gitignore * IMPORTANT - changed defaults to sirt in pyctest scripts - renamed _global functions to _kernel - approx 6x speed-up on TomoBank dataset - reduced memory for C++ * Fixed __global__ in header when CUDA=OFF * Updated .travis.yml * Fixed benchmarking/.gitignore to not ignore itself * Enabled compilation without PTL * Update sirt.cu * Updates that improve the CUDA performance to > 50x speed-up * Update sirt.cc * Docker updates - .docker/Dockerfile.cuda --> Dockerfile - apt.sh includes clang-format - runtime-entrypoint.sh enters /home/tomopy directory - runtime-entrypoint.sh attempts install on start-up * Update sirt.cc * OpenCV support + CUDA mlem + modern CMake with CUDA - Added OpenCV support - Updated CMake to modern usage of CUDA as language - Updated environments * Fixed MLEM (slightly broken -- memory constaints) * MLEM impl w/o arrays + SIRT CPU updates - SIRT cpu has some testing code for expansion + compression - normalize SIRT and MLEM in Python extern.py * NPP Affine for CUDA - Migrated some utils_cuda.cu to sum.cu - Implemented CUDA affine transform - Link to NPP * Reorganized + CUDA NPP rotate + project.cc + test_nppi.cu * SIRT CUDA rt performance improvements * Reverted extern.py + PTL improvements * PyCTest + Travis fixes * Cleanup + CMake + IPP - Added prelim support for IPP (default = OFF) - Cleaned out the repository - CMake fixes for detecting CUDA - CMake fixes for PGI + OpenACC - envs install IPP - Warning fixes * rotate project + multiple device CUDA + MT fixes - project with rotation is semi-working - CUDA should support multiple devices - multithreading initialization simplification - CUDA should run slightly faster (hopefully) - clang-format update to break after templates - pyctest_tomopy_phantom prints projection - cxx_mlem disabled by default * PyBind11 fixes * Update tomocxx.hpp * Fixes for CUDA SIRT * Updated env/tomopy-python27.yml to not use Intel * Fixed SIRT C++ CPU segfault * Updated PTL * Fixed PGI compiler warnings * Docker installs nsight + removed deviceToDevice cudaMemcpy * Removed thrust header include * Overlapping streams + NVTX updates * Multi-GPU (potential) fixes * NVTX_RANGE_POP updates + optimizations for streams * Working multi-GPU version (requires TOMOPY_USE_PTL=OFF) * Fix to TOMOPY_USE_PTL=ON + multi-GPU - TaskRunManager is now thread-local static instance instead of static instance * partial reconstruction + PTL run manager fixes * Update common.hh * Update CPU vs. GPU task run manager * REF: Remove unused imports * REF: Move dxchange to lazy imports in tomo.prep.alignment dxchange has a many dependencies, but is rarely used. In order to remove it from the conda requirements, it is now optional. In this case, it cannot be replaced with tifffile because it provides additional functionality such as not over- writing existing files. * REF: Replace dxchange with tifffile in recon.rotation In this case, the functionality of dxchange is easily replaced with a direct call to tifffile. This removes dxchange as a hard depedency of tomopy. * BLD: Remove dxchange from requirements.txt and meta.yaml Also reorder the requirements and update with tifffile and pywavelets because those are directly imported. Pywavelets is also required by scikit-image, but it should be listed here too. * SLURM files + extra messages in pyctest_tomopy_rec.py * Fixed warnings + pinned memory * Launch and synch optimizations * Launch and sync optimizations * Memory fixes and sync optimizations * Memory optimizations * Sync optimizations * Fix to TOMOPY_PYTHON_THREADS check * Thread ID info * PTL update + better PTL parallelism * PTL affinity configure via env (PTL_CPU_AFFINITY) * Updated gpu template functions to enforce async more explicitly * Docker updates * SLURM updates * GetEnv + CXX PTL + format * SLURM updates * PTL CPU affinity updates + lower memory overhead in SIRT CUDA * Removed Intel packages * BLD: More efficient way to exclude pyc files from install * BLD: More efficent way to add files in tomopy/ to install * BLD: Remove tests from installed packages * BLD: Don't copy compiled library to source tree The compiled library should only end up in the install directory and in the build directory. Devs who want the library copied to the source tree should use `python setup.py develop` or `pip install -e .` In that case, the install directory is the source directory. * BLD: Remove source files from final installation End users do not need copies of CMake files or the C source files. If they want these things, they can download the complete source from GitHub. * BLD: List installed packages manually in setup.py * BLD: Use CMake to set version in __init__.py * Docker updates + PTL updates + formatting * Fix to benchmarking phantom test construction * Update to coverage script * Fixed pyctest nosetest * Updated benchmarking/__init__.py * Updated benchmarking/__init__.py * Docker + CMake fixes * Fix to directories in pyctest_tomopy_phantom.py * Fixes to pyctest phantom * Folder restructuring * ART on GPU + rotate change + DeviceOption + MANIFEST - Fixed python starting extra threads - moved utils_cuda.h to .hh - Used GpuOption scheme to control CPU vs. GPU - Unified CXX selection - Fixed finding OpenCV * Fixed missing TOMOPY_USE_OPENCV - Fixed error about NPP when not using NVCC * cuda_mult_kernel + gpu final rotation fixes + clang-format fix * Low freq fix + CPU template rotates + GPU int rotates - Disabled TOMOPY_CXX_GRIDREC by default * Iteration info for C + ART * Partial recon + correct GPU + SIRT solution + cleanup * Removed debug exception + nosetest set environ * Updated CI * BLD: Replace VERSION with setuptools_scm Instead of manually managing the VERSION file, setuptools_scm will automatically create a version number based on git tags. CMake interrogates git separately and does not include the git hash on the version number because it only allows numbers. I chose setuptools_scm instead of versioneer because it doesn't require adding any additional files the the repo. Instead all of the logic is contained within the scm package which is a dependency which is automatically installed by setup.py at install time or it can be pre-installed in the environment. * REF: Reorganize files so source doesn't overshadow installed By moving the python module to /src, devs have the choice of testing against the source code or installed code by running the tests from either at / or inside /tests. * REF: Move tests down one directory because if __init__.py test is actually a python module because they have an __init__.py. This means that the modules in src are also imported and the source will always overshadow the installed tomopy. Moving the tests module down one directory solves this problem. * REF: Adjust CMake to new file structure Appended /src to file paths and put if(NOT SKBUILD) around copy operations that are not necessary if building without an IDE. * REF: Change directories for coverage and tests * DOC: Correct spelling languae -> language * Updated SLURM scripts + fixed OpenCV includes * GPU MLEM + template execute - sync_freq - TOMOPY_USE_OPENMP=OFF by default - removed old gpu mlem implementation * Fixes to GPU MLEM and SIRT. Excellent recon! * CPU MLEM and SIRT + GetEnv choices + execute fixes - Both MLEM and SIRT have working rotate versions for GPU and CPU now - TOMOPY_INTER is restricted to choices (NN, LINEAR, CUBIC) * Fixed NaN at high iterations (SIRT, MLEM) * Update _forward_args_t to use std::move * Introduced invoker template and binding to fix expansion issues - execute template function was failing with some compilers * Minor cleanup changes + changes to pack expansion * Travis update + Linux conda compilers - Installing GCC for Linux in conda environments * Removed unnecessary calc + - fnx not needed - fixed tuple warnings in morph.py - generate_compare.sh support for mlem * Memory reduction + Open{MP,ACC} removal * Optimizations for sirt/mlem (reducing kernel launches) * CMake cleanup + no env compilers * Removed PyBind11 support * cxx_extern.h + reduce scikit install + removed allocator * Increase max jobs on Appveyor Increasing the maximum number of jobs allows pull request validation to occur faster because multiple builds can run in parallel. * CPU thread-local tasking run manager + python thread locking - cpu_data uses per-python thread mutexes to eliminate unnecessary locking - remove lambda execution in SIRT * Fixed util/dtype.py typecodes, algorithm.py tuple index in message - envs/tomopy-python35.yml uses older pyctest * TST: Use setup.cfg not .coveragerc to specify covered package Using this method instead of relative paths means that python-coverage will be able to find the installed tomopy and we don't need to use python setup.py develop inside the pyctest_tomopy.py * Separate out computing sum_dist - sum_dist is now computed independently * SLURM updates * Update utils_cuda.cu * Update utils_cuda.cu * Update utils_cuda.cu * CUDA_*_SIZE -> TOMOPY_*_SIZE + env for block/grid dim3 - SLURM updates (significant) * Immediate sum_dist calculation * Warm-up kernel launch * Removed async for cuda_compute_sum_dist * Update utils_cuda.cu * Update utils_cuda.hh * destroy_stream syncs + minor improvements * Fixes to SLURM env-common-settings.sh * Move cache reset up higher in SIRT and MLEM * Update env-common-settings.sh * Update env-common-settings.sh * Nearest-neighbor interpolation is default * DIsable "TOMOPY_USE_C_ALGORITHMS" from affecting project * BUG: Resized shaped must be iterable * BUG: util.dtype not compatible with numpy 1.16.1 numpy/numpy#12769 breaks compatibility with TomoPy because np.ctypeslib._typecodes no longer exists. This patch uses public functions instead in a way that is backward compatible. Closes #392 * BLD: Tell NVCC which host compiler to use (#7) NVCC should be told to use the same host compiler that CMake has identified as the CXX compiler. Otherwise, unexpected behavior may occur. * REF: Replace recon.algorithm switch with getattr() We can remove this long switch statement with getattr() because each of the functions that util.extern implements is an attribute of the module. getattr() is a lower maintenance option because we no longer had to change add and remove options from this switch. * REF: Make recon.algorithms.allowed_kwargs global This makes the list of implemented functions public which is good for benchmarking because we can ask tomopy what options are available instead of reading the docs. * Potential optimization in summation for SIRT * Update sirt.cu * Update sirt.cu * Update sirt.cu * Update sirt.cu * DOC: Update badges in README Badges on the README are pointing to the wrong repositories. They should be pointing to the conda-forge anaconda channel and the tomopy/tomopy coveralls instead of dgursoy repos. * BLD: Add setuptools_scm_git_archive Without this setuptools_scm extension, you cannot build from a git archive such as the tarball that is downloadable from GitHub. This is because there is no repository to scrape the version number. * Removed thrust + update PTL + PTL simplified interface * Update source/gpu/gpu.cu * Updated PTL * Update sirt.cu - testing CUDA graph * Update PTL * Updates to SIRT graph exec (not working) * Fixes to CUDA graph * Massive cleanup + reorganization * Update data.hh * Update CUDA compute_projection for SIRT and MLEM * Update execute * Update common.hh - execute update * Updated execute (and usage) to not loop over slices * Cleanup + CPU rotation updates - Removed OpenMP and OpenACC from build system - Removed unnecessary TOMOPY_USE_GPU - Added common.cc - Removed duplicate macros - Added some CUDA queries to C++ when CUDA not available - Removed test/test_nppi.cu and test/test_opencv.cc - Added some docstrings * OpenCV header fix * clang-tidy + removed docker + removed slurm + PTL updates * Resetting the device at the end of GPU algorithms * CUDA_ARCH changes * Update Options.cmake - disable clang-tidy by default * Removed profile/run scripts from benchmarking * Updated output_dir for pyctest_tomopy_rec.py * GpuData (cache) safety * Update .travis.yml * Sync guards * Fixes to CUDA_ARCH * Update PTL - fixed customized CFLAGS and CXXFLAGS to_list instead of string * OpenCV requirement + removed cooperative_groups header include (unused) for CUDA 8 or earlier * REF: Arrange directories like jrmadsen/gpu * REF: Move python source into source folder * FIXME: Make separate tets for each back-end There are no tests for the new back-end, and the old back-ends need to be selected using an environment variable? The old implementation should be the default. * MAI: Remove unused files VERSION has been replaced by setuptools_Scm. requirements has been moved to envs/ conda meta.yaml is now stored on the conda-forge/tomopy-feedstock because we don't build it ourselves anymore * Restore benchmarking from jrmadsen/tomopy-gpu * REF: Moved tomopy.misc.benchmarks in the root benchmarks module * MAI: Remove VERSION from manifest * BLD: Clean up envs and CI yamls Added two environments for windows and removed logic comments because those don't work outside of recipes. * BLD: Remove coveralls from Travis python-37 build * BUG: Don't import the submodules We cannot import the submodules for benchmarking because requires TomoPy, and TomoPy may not be installed. * BLD: Reorganize CI build to match anaconda recommendations * BLD: Use defaults:libopencv not conda-forge:opencv The default channel opencv is split into two subpackages: python-opencv and libopencv. The conda-forge package is not split. We only need the C/C++ libraries to build and run against. * BUG: Don't update conda on Appveyor [skip travis] There's some bug (appveyor/ci#2270) where the conda environment is disrupted if you update. * BUG: Use git clone instead of tarball [skip travis] setuptools_scm needs a git hash to function. The shallow clone option for appveyor downloads a tarball without repo information. * BUG: Win compiler missing M_PI definition * BLD: Add git to build requirements Also coveralls is not compatible with py 3.7 * DOC: Add more docstrings * BUG: Replace setenv with cv::setNumThreads setenv is not part of the ISO C standard, thus it will not compile on Windows. Here we are replacing these setenv calls with the OpenCV setNumThreads function call to accomplish the same task. * BLD: Pull Windows build updates for PTL * CMake cleanup + update PTL + OSX envs + fix benchmark - moved tomopy-python*.yml to linux-*.yml - add CMAKE_OSX_DEPLOYMENT_TARGET to setup.py - moved util.py to utilities/__init__.py because 'import util' was importing 'timemory.util' * Simplified thread-pool and initialization sets for algorithms - enable/disable tasking option * Moved benchmarking/utilities to source/tomopy/misc/benchmark.py - Moving to this location is breaking pyctest * Update phantom.py * Update phantom.py * Update __init__.py * Reverted PTL to master branch because of strange static cleanup behavior * BUG: Fix default value for --exclude-phantoms The default option for --exclude-phantoms in pyctest_tomopy.py should be the empty list `[]` instead of `None` because this parameter is treated as an iterator. * BLD: Add coverage tests back to Travis CI pyctest.pyctest.run() will always return `None` regardless of whether the build or other tests failed. Thus, instead of the Travis tests failing when the build fails, we need to check whether we can run tests. Also, because we are using `conda` to install all the dependencies, we can use the `minimal` Travis image. * Fixes GPU sirt and mlem + phantom/rec scripts use png - Updated PTL to implicit-manager-interface branch - CUDA error checks * Windows fixes (uses MSVC compiler now) - due to issues with MingGW + OpenCV, Windows builds not utilize MSVC - Windows does not support OpenMP SIMD so TOMOPY_USE_OPENMP is disregarded - Windows uses C++ version of gridrec (std::complex) - Removed MinGW from envs/win-{36,37}.yml - Removed tbb-devel from envs/win-{36,37}.yml - Added vs2015_win-64 to envs/win-{36,37}.yml - Added cv::setNumThreads(1) * Removed timemory from envs/win-37.yml * CMake coverage fixes + suppress setup.py warnings + PTL update for sequential tids * Migration of settings to Python interface - added ['accelerated', 'pool_size', 'interpolation', 'device', 'grid_size', 'block_size'] to mlem and sirt - created RuntimeOptions class * Updated kwargs for accelerated algorithms * GPU documentation * Build opts to force flags/libs + PTL shared lib support + PTL bug fix - PTL has a bug fix that very randomly would cause segfault as it destroyed thread-pool - tomopy can build PTL as a shared library - Added TOMOPY_USER_FLAGS and TOMOPY_USER_LIBRARIES CMake Options * Safer thread-pool cleanup + fix to strange behavior in NPP rotating integers * Update macros.hh - fix to dummy thread-pool when tasking is disabled * BUG: Add double braces for C++11 compatibility The conda-forge gxx compiler is missing a patch which allows the initialization of std:array without double braces. Read more about this problem here: https://en.cppreference.com/w/cpp/container/array https://stackoverflow.com/a/11400125/4459405 * Update PTL * Updates fixing a sporadic bug deleting thread-local thread-pool - updated PTL to new revision that resolves the occasional data race when the threads in thread-pool exit the execute_thread function after ThreadPool instance was destroyed. The error arose because those threads were trying to unlock a mutex that was created by the ThreadPool instance that was already destroyed - removed .dockerignore * Disable linux-{27,36,37}.yml from using OpenBLAS. May also be needed on macOS * Specify scipy<1.3 for envs/linux-{36,37}.yml until scipy-feedstock is fixed
- Loading branch information