Skip to content

CUDA Implementations of Dynamic Time Warping and SoftDTW loss function for time series machine learning. Course project for CSS535 HPC

License

Notifications You must be signed in to change notification settings

alexkyllo/cuTimeWarp

Repository files navigation

cuTimeWarp

CUDA C++ implementations of Dynamic Time Warping and SoftDTW loss function for time series machine learning.

Based on algorithms described in:

Building

This project uses a Makefile to coordinate separate compilation of CUDA kernels and C++ code and is tested on Ubuntu Linux. Typing make will list the available commands:

$ make

Available rules:

build               Build binaries
clean               Delete binaries
fmt                 Format the code with clang-format
plot                Run python script to generate plots
report              Compile the PDF report
run                 Run experiments
run_multi           Run multi-distance experiments
test                Build and run unit tests

To compile the kernels and the test programs, use the make build command.

All C++ / CUDA source code is found in the src/ folder.

Library Dependencies

In addition to depending on the CUDA runtime and cuBLAS (tested with CUDA 11.2), the programs link to BLAS for the CPU implementations, so a version of this library such as (e.g. OpenBLAS) must be available on the machine.

Running

The three programs to use for running comparative performance experiments are:

  • bin/soft_dtw_perf_cpu for timing CPU performance
  • bin/soft_dtw_perf_multi for timing GPU performance
  • bin/soft_dtw_perf_tiled for timing the tiled kernel on GPU (for long time series > 1024)

The programs accept as arguments either a filename containing space-delimited data (see data/ECG200/ECG200_ALL.txt) or the word random and a time series length and count. The program will compute the Soft-DTW dissimilarity between all pairs of time series in the batch and then print output in four columns:

  • Kernel function name
  • The input time series length (number of columns per row)
  • The input time series count (number of rows)
  • The execution time in microseconds

Example:

$ ./bin/soft_dtw_perf_multi
Usage: ./bin/soft_dtw_perf_multi [INPUT_FILENAME] | random [length] [count]

$  ./bin/soft_dtw_perf_multi ./data/ECG200/ECG200_ALL.txt
Data file ./data/ECG200/ECG200_ALL.txt contains 200 time series of length 96
sq_euclid_dist_multi 96 200 515037
softdtw_cuda_naive_multi 96 200 264987
softdtw_cuda_naive_multi_bw_80 96 200 235089
softdtw_cuda_naive_multi_bw_60 96 200 168621
softdtw_cuda_naive_multi_bw_40 96 200 83501
softdtw_cuda_naive_multi_bw_20 96 200 51338
softdtw_cuda_stencil_multi 96 200 100990
softdtw_cuda_stencil_multi_80 96 200 100408
softdtw_cuda_stencil_multi_60 96 200 100844
softdtw_cuda_stencil_multi_40 96 200 101215
softdtw_cuda_stencil_multi_40 96 200 100436
softdtw_cuda_stencil_multi_20 96 200 100647
convert_diagonal_multi 96 200 332664
softdtw_cuda_diagonal_multi 96 200 149158

$ ./bin/soft_dtw_perf_multi random 100 100
sq_euclid_dist_multi 100 100 335883
softdtw_cuda_naive_multi 100 100 61576
softdtw_cuda_naive_multi_bw_80 100 100 52272
softdtw_cuda_naive_multi_bw_60 100 100 32211
softdtw_cuda_naive_multi_bw_40 100 100 18919
softdtw_cuda_naive_multi_bw_20 100 100 18725
softdtw_cuda_stencil_multi 100 100 26558
softdtw_cuda_stencil_multi_80 100 100 25803
softdtw_cuda_stencil_multi_60 100 100 31000
softdtw_cuda_stencil_multi_40 100 100 26120
softdtw_cuda_stencil_multi_40 100 100 25804
softdtw_cuda_stencil_multi_20 100 100 30992
convert_diagonal_multi 100 100 87427
softdtw_cuda_diagonal_multi 100 100 43893

TODO List

  • Implement naive DTW on CPU
  • Implement soft DTW on CPU
  • Choose benchmarking datasets
  • Implement pairwise squared Euclidean distance on CPU
  • Implement soft DTW gradient on CPU
  • Implement soft DTW barycenter estimation on CPU
  • Implement naive soft DTW in CUDA
  • Implement pairwise squared Euclidean distance in CUDA
  • Implement soft DTW gradient in CUDA
  • Implement soft DTW barycenter estimation in CUDA
  • Tiling
  • Shared memory stencil
  • Sakoe-Chiba bands
  • Contiguous diagonal-major array storage layout
  • Run benchmark experiments
  • Analysis of experiment results

About

CUDA Implementations of Dynamic Time Warping and SoftDTW loss function for time series machine learning. Course project for CSS535 HPC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published