Releases: cudpp/cudpp
Releases · cudpp/cudpp
2.2
- Added cudppSuffixArray parallel skew algorithm for computing suffix array
- Replaced the cudppStringSort in burrowsWheelerTransform in cudppCompress
with cudppSuffixArray to achieve better performance - Fixed bugs in cudppMoveToFrontTransform where originally only inputs with values smaller than 15 work
- Fixed bugs to support cudppCompress to compress text containing all possible unsigned char values with the range of [1...255]
- Changed test files for cudppCompress and cudppMoveToFrontTransform to target the new BWT method
- Added -skiplargetests for MTF tests in order to avoid launch-timed-out errors
- Fixed bugs to make cudppStringSort compatible for gpu compute capability less than 2.0
- Makefile fixes for OS X with clang compilation
2.1
- Added cudppCompress lossless data compression algorithms which implement
the Move-to-Front transform, Burrows-Wheeler Transform, and
Huffman encoding - Added cudppMoveToFrontTransform and cudppBurrowsWheelerTransform
- Added cudppListRank parallel list ranking
- Renamed cudppSort to cudppRadixSort
- Added cudppMergeSort parallel merge sort
- Added cudppStringSort parallel string sort
- Moved source code to Github: http://cudpp.github.io
- Moved documentation pages from cudpp.h to README.md
- Added CUDPP_GENCODE_* CMake options to make CUDA target architecture compilation more flexible
Test Data
2.0
- New thread-safe public interface -- requires creating a CUDPP instance
with cudppCreate, and passing it to all functions - Added cudppReduce parallel reductions
- Added cudppTridiagonal parallel tridiagonal linear system solver
- Added cudppHashTable parallel hash table data structure
- Added 64-bit type support (
double
,long long
, andunsigned long long
),
implemented in cudppReduce, cudppScan, [cudppMultiScan](http://cudpp.github.io/cudpp/2.0/group__public_interface.html#gad9655b51dba16bc43b8adee4507dc1d0, cudppSegmentedScan,
cudppCompact, cudppRadixSort, cudppTridiagonal - Fixed various bugs in cudppSegmentedScan
- Replaced radix sort implementation with thrust::sort() due to performance
advantages and simplicity. There are regressions in sort performance for
smaller-sized arrays, which will be addressed in the next release. - Reverse sorting now supported (use CUDPP_OPTION_BACKWARD option when
creating the sort plan) - cudppRadixSort now supports
char
,uchar
,int
,uint
,float
,double
,longlong
,
andulonglong
keys. - Removed all support for device emulation
- Removed all dependencies on CUTIL; removed common/ subdirectory, added
minimal set of app utilities in apps/common/ subdirectory - Improved coverage of cudpp_testrig
1.1.1
29 April 2010
- Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20)
architecture GPUs(proper use of "volatile" keyword) - Some initial small optimizations for radix sort and scan on Fermi
(sm_20) architecture - Fix emulation mode radix sort of very small arrays
- Fix radix sort on 64-bit OSes by using launch_bounds in CUDA 3.0
- Minor efficiency improvement to radix sort test in cudpp_testrig
- Fixed incorrect identity for min operator
- Fixes for unix and Mac OS X Snow Leopard builds
- Fixes for 64-bit windows builds
- Bibliography updates
- Minor documentation fixes
1.1
1 July 2009
- Switched to pure BSD license.
- Added new radix sort implementation under cudppSort() (based on Satish et al.
IPDPS '09 paper). All previous sorts have been removed. - Added cudppRand() pseudorandom number generation (based on Tzeng and Wei I3D 08
paper). - Added support for backward segmented scan.
- Fixed satGL example to run in a native window on OS X, rather than an X11 window.
- Removed Visual Studio 7.1 (2003) project files. CUDA 2.1 and later dropped
support for VS7.1. - Miscellaneous bug fixes.
- In docs, Added list of publications that use CUDPP, including both text and
bibtex citation format. - In docs, Updated list of publications of algorithms includedin CUDPP.
- Miscellaneous Documentation improvements.