NATSA: A Near-Data Processing Accelerator for Time Series Analysis

NATSA is the first Near-Data Processing Accelerator for Time Series Analysis. NATSA enables high-performance and energy-efficient time series analysis for a wide range of applications, by minimizing the overheads of data movement. This can enable efficient time series analysis on large-scale systems as well as embedded and mobile devices, where power consumption is a critical constraint (e.g., heartbeat analysis on a mobile medical device to predict a heart attack).

💡Watch our talk about NATSA! Slides: (pptx) (pdf)

Time Series Analysis

Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and more. Matrix profile, the state-of-the-art algorithm to perform time series analysis, computes the most similar subsequence for a given query subsequence within a sliced time series. Matrix profile has low arithmetic intensity, but it typically operates on large amounts of time series data. In current computing systems, this data needs to be moved between the off-chip memory units and the on-chip computation units for performing matrix profile. This causes a major performance bottleneck as data movement is extremely costly in terms of both execution time and energy.

Key Idea and Benefits of NATSA

The key idea of NATSA is to exploit modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile computation near memory, where time series data resides. NATSA provides three key benefits: 1) quickly computing the matrix profile for a wide range of applications by building specialized energy-efficient floating-point arithmetic processing units close to HBM, 2) improving the energy efficiency and execution time by reducing the need for data movement over slow and energy-hungry buses between the computation units and the memory units, and 3) analyzing time series data at scale by exploiting low-latency, high-bandwidth, and energy-efficient memory access provided by HBM. Our experimental evaluation shows that NATSA improves performance by up to 14.2× (9.9× on average) and reduces energy by up to 27.2× (19.4× on average), over the state-of-the-art multi-core implementation. NATSA also improves performance by 6.3× and reduces energy by 10.2× over a general-purpose NDP platform with 64 in-order cores.

Repository Structure and Usage

⚠️ This repository contains the codes used in NATSA's performance and energy evaluation for our ICCD 2020 paper using architectural simulation frameworks. If you are interested in performing motif/discord discovery in time series using Matrix Profile, please visit https://www.cs.ucr.edu/~eamonn/MatrixProfile.html, where you will find ready-to-use implementations for CPU and GPU.

We point out next the repository structure and some important folders and files.

.
+-- README.md
+-- aladdin/
|   +-- config_files/
|   +-- scrimp_src/
+-- gpu/
+-- images/
+-- mcpat/
+-- timeseries/
+-- xeonphi/
+-- zsimramulator/
|   +-- config_files/
|	|	+-- scrimp/
|   +-- scrimp_src/

In the "aladdin" directory, you will find the configuration file needed to evaluate NATSA's performance, area and energy using gem5+Aladdin simulator. To do so, you will also find SCRIMP source code tuned for our proposed architecture.
In the "gpu" directory, you will find the gpu implementation of SCRIMP that we used in our evaluation for performance and energy comparison purposes. Please tune Makefile according to your GPU.
In the "mcpat" directory, you will find the two config files used for the area and energy evaluation of the general-purpose cores.
In the "timeseries" directory, you will find sample datasets to evaluate the performance of NATSA, the general-purpose cores and the commodity implementations.
In the "xeonphi" directory, you will find the source code of the Xeon Phi implementation of SCRIMP, which takes advantage of the HBM memory available in such architecture.
In the "zsimramulator" directory, you will find the required config files to evaluate the performance of the general-purpose cores in the ZSim+Ramulator simulation environment. Additionaly, you will find SCRIMP source code ready-to-use with such environment.

Prerequisites

NATSA's evaluation requires the following simulation frameworks. Please refer to their corresponding documentation to install them and their dependencies.

ramulator-pim: https://github.com/CMU-SAFARI/ramulator-pim
McPAT: https://github.com/HewlettPackard/mcpat
gem5-aladdin: https://github.com/harvard-acc/gem5-aladdin

Additionaly, the Xeon Phi code requires a supported processor (e.g., Intel Xeon Phi 7210) and the GPU code requires a CUDA-capable GPU.

General-purpose Core Simulation

The general-purpose core performance can be simulated using ramulator-pim environment and the files provided in zsimramulator folder. Please refer to ramulator-pim documentation to set up the environment. The source code of SCRIMP, ready to be used in ramulator-pim is under the folder zsimramulator/scrimp_src, which has to be compiled before running the simulator. Once the desired configuration file is properly adjusted with the users's paths, the simulator can be started as follows:

./build/opt/zsim config_files/scrimp/262144/2048/arm/hbm/64/scrimp_arm_hbm.cfg

This example will evaluate SCRIMP in a arm-like 64-core configuration with HBM, using a time series of 262144 elements and a window size of 2048.

The general-purpose core energy and area can be estimated using McPAT and the config files provided in mcpat folder. Please refer to McPAT documentation to set up the environment. The following example estimates the energy and area for the ARM-like configuration:

./mcpat -infile ARM_64.xml

NATSA Simulation

The performance, area and energy of NATSA can be simulated using gem5-aladdin simulation framework and the files provided in aladdin folder. Please refer to gem5-aladdin documentation to set up the environment. The source file of SCRIMP, optimized for NATSA is located under aladdin/scrimp_src, which can be tunned according to the desired time series parameters.

To run it, simply launch gem5 with default HBM memory model and pass the .cfgfile to aladdin.

Xeon Phi and GPU Execution

The code to perform the Xeon Phi executions is located under xeonphi folder, simply make it and run it with the desired time series (example time series are located under timeseries folder. This is an example execution for a time series of 524288 elements, window size of 4096 and 256 threads:

./scrimp_xeonphi 4096 randomSerie524288.txt 4096 256

The code to perform the GPU executions is located under gpu folder, which requires a CUDA-capable GPU to run. Once compiled using the provided Makefile, it can be executed as follows:

./SCRIMP 4096 randomSerie524288.txt out.txt

Other Resources

Slides used in our ICCD 2020 presentation: (pptx) (pdf)
Talk video (10 minutes)

Getting Help

If you have any suggestion for improvement, please contact ivanferveg [at] gmail.com
If you encounter bugs or have further questions or requests, you can raise an issue at the issue page.

Cite NATSA

Please cite the following paper if you find NATSA useful:

Ivan Fernandez, Ricardo Quislant, Christina Giannoula, Mohammed Alser, Juan Gómez-Luna, Eladio Gutiérrez, Oscar Plata, and Onur Mutlu. "NATSA: A Near-Data Processing Accelerator for Time Series Analysis" Proceedings of the 38th IEEE International Conference on Computer Design (ICCD), Virtual, October 2020.

Below is bibtex format for citation.

@inproceedings{fernandez2020natsa,
  title={NATSA: A Near-Data Processing Accelerator for Time Series Analysis},
  author={Fernandez, Ivan and Quislant, Ricardo and Guti{\'e}rrez, Eladio and Plata, Oscar and Giannoula, Christina and Alser, Mohammed and G{\'o}mez-Luna, Juan and Mutlu, Onur},
  booktitle={2020 IEEE 38th International Conference on Computer Design (ICCD)},
  pages={120--129},
  year={2020},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aladdin

aladdin

gpu

gpu

images

images

mcpat

mcpat

timeseries

timeseries

xeonphi

xeonphi

zsimramulator

zsimramulator

README.md

README.md

Repository files navigation

NATSA: A Near-Data Processing Accelerator for Time Series Analysis

Table of Contents

Time Series Analysis

Key Idea and Benefits of NATSA

Repository Structure and Usage

Prerequisites

General-purpose Core Simulation

NATSA Simulation

Xeon Phi and GPU Execution

Other Resources

Getting Help

Cite NATSA

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
aladdin		aladdin
gpu		gpu
images		images
mcpat		mcpat
timeseries		timeseries
xeonphi		xeonphi
zsimramulator		zsimramulator
README.md		README.md

CMU-SAFARI/NATSA

Folders and files

Latest commit

History

Repository files navigation

NATSA: A Near-Data Processing Accelerator for Time Series Analysis

Table of Contents

Time Series Analysis

Key Idea and Benefits of NATSA

Repository Structure and Usage

Prerequisites

General-purpose Core Simulation

NATSA Simulation

Xeon Phi and GPU Execution

Other Resources

Getting Help

Cite NATSA

About

Topics

Resources

Stars

Watchers

Forks

Languages