GitHub - hpcgarage/ParTI: Parallel Tensor Infrastructure (ParTI!)

ParTI!

(A Parallel Tensor Infrastructure!)

A Parallel Tensor Infrastructure (ParTI!), is to support fast essential sparse tensor operations and tensor decompositions on multicore CPU and GPU architectures. These basic tensor operations are critical to the overall performance of tensor analysis algorithms (such as tensor decomposition). ParTI! is formerly known as SpTOL.

News (June 2019): For the reordering code published in Efficient and Effective Sparse Tensor Reordering, please check v1.1.0.)

Notice: For the code used in Optimizing sparse tensor times matrix on GPUs, please go to branch JPDC.

Supported sparse tensor operations:

Scala-tensor mul/div [CPU]
Kronecker product [CPU]
Khatri-Rao product [CPU]
Sparse tensor matricization [CPU]
Element-wise tensor add/sub/mul/div [CPU, Multicore, GPU]
Sparse tensor-times-dense matrix (SpTTM) [CPU, Multicore, GPU]
Sparse matricized tensor times Khatri-Rao product (SpMTTKRP) [CPU, Multicore, GPU]
Sparse tensor matricization [CPU]
Sparse CANDECOMP/PARAFAC decomposition
Sparse Tucker decomposition (refer to branch JPDC)

Supported sparse tensor formats:

Coordinate (COO) format
Hierarchical coordinate (HiCOO) format [Paper]

Build requirements:

C Compiler (GCC or ICC or Clang)
CMake (>v3.2)
CUDA SDK [Required for GPU algorithms]
OpenBLAS (Or an alternative BLAS and Lapack library) [Required for tensor decomposition]
MAGMA [Optional]

Build:

Create a file by `cp build.config build-sample.config' and change the settings appropriately
Type ./build.sh
Check build for resulting library
Check build/tests for testing programs, for basic functionality
Check build/examples for example programs including MTTKRP, TTM, CP decomposition

Build MATLAB interface (Not ready for new functions):

cd matlab
export LD_LIBRARY_PATH=../build:$LD_LIBRARY_PATH
Type make to build all functions into MEX library.
matlab
1. In matlab environment, type addpath(pwd)
2. Play with ParTI MATLAB inferface.

Build docs:

Install Doxygen
Go to docs
Type make

Run examples:

Please refer to GettingStarted.md for more general cases. Only some major functions are shown below.

MTTKRP:

COO-MTTKRP (CPU, Multicore)
- Usage: ./build/examples/mttkrp [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (default -1: loop all modes, or specify a mode, e.g., 0 or 1 or 2 for third-order tensors.)
  - -s sortcase, --sortcase=SORTCASE (0:default,1,2,3,4. Different tensor sorting.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (Only for sortcase=3)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (Only for sortcase=3)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (the number of matrix columns, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - -u use_reduce, --ur=use_reduce (use privatization or not)
  - --help

COO-MTTKRP (GPU)
- Usage: ./build/examples/mttkrp_gpu [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (specify a mode, e.g., 0 or 1 or 2 for third-order tensors. Default:0.)
  - -s sortcase, --sortcase=SORTCASE (0:default,1,2,3,4. Different tensor sorting.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (Only for sortcase=3)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (Only for sortcase=3)
  - -d CUDA_DEV_ID,, --cuda-dev-id=CUDA_DEV_ID (>0:GPU device id)
  - -r RANK (the number of matrix columns, 16:default)
  - GPU options:
  - -p IMPL_NUM, --impl-num=IMPL_NUM (11, 12, 15, where 15 should be the best case)
  - --help
HiCOO-MTTKRP (CPU, Multicore)
- Usage: ./build/examples/mttkrp_hicoo [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (default -1: loop all modes, or specify a mode, e.g., 0 or 1 or 2 for third-order tensors.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (required)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (required)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (the number of matrix columns, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - --help

CPD:

COO-CPD (CPU, Multicore)
- Usage: ./build/examples/cpd [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (CPD rank, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - -u use_reduce, --ur=use_reduce (use privatization or not)
  - --help
COO-CPD (GPU)
- Usage: ./build/examples/cpd_gpu [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -d CUDA_DEV_ID, --cuda-dev-id=CUDA_DEV_ID (>=0:GPU device id)
  - -r RANK (CPD rank, 16:default)
  - GPU options:
  - -p IMPL_NUM, --impl-num=IMPL_NUM (11, 12, 15, where 15 should be the best case)v
  - --help
HiCOO-CPD (CPU, Multicore)
- Usage: ./build/examples/cpd_hicoo [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (required)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (required)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (CPD rank, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - --help

TTM:

COO-TTM (CPU, Multicore, GPU)
- Usage: ./build/examples/ttm tsr mode impl_num smem_size [cuda_dev_id, R, output]
- tsr: input sparse tensor
- mode: specify tensor mode, e.g. (0, or 1, or 2) for third-order tensors
- impl_num: 11, 12, 13, 14, 15, where either 14 or 15 should be the best case
- smem_size: shared memory size in bytes (0, or 16000, or 32000, or 48000)
- cuda_dev_id: -2, -1, or 0, 1, ... -2: sequential code; -1: omp code; 0, or other possible integer: GPU devide id. [Optinal, -2 by default]
- R: rank number (matrix column size), an integer. [Optinal, 16 by default]
- output: the file name for output. [Optinal]
- An example: ./build/examples/ttm example.tns 0 15 16000 0 16 result.txt
SCOO-TTM (CPU, GPU)
- Usage: ./build/examples/sttm tsr U Y mode [cuda_dev_id]
- tsr: input semi-sparse tensor
- U: input dense matrix
- Y: output semi-sparse tensor
- mode: specify tensor mode, e.g. (0, or 1, or 2) for third-order tensors
- cuda_dev_id: -1, or 0, 1, ... -1: sequential code; 0, or other possible integer: GPU devide id. [Optinal, -1 by default]

Tucker Decomposition

COO-Tucker (CPU, GPU)

The code is in the jpdc branch, which is a complete different and imcompatible codebase in C++.
- Usage: ./build/examples/tucker --dev device [options] tsr R1 R2 ... dimorder1 dimorder2 ...
- device: CPU core ID or GPU ID, obtain with ./build/examples/detect_devices (Currently multicore CPU is not implemented)
- tsr: input sparse tensor
- R1, R2, ...: the shape of expected output core tensor
- dimorder1, dimorder2, ...: the order of the TTM chain operation
- Options:
- -o, --output: output the core tensor into a text file
- -d, --dense-format: print the result to screen in dense format, instead of sparse format
- -l, --limit: limit how much result to print to screen

The algorithms and details are described in the following publications.

Publications

Efficient and effective sparse tensor reordering. Jiajia Li, Bora Uçar, Ümit Çatalyürek, Kevin Barker, Richard Vuduc. In Proceedings of the ACM International Conference on Supercomputing (ICS), June 2019. doi:10.1145/3330345.3330366.
HiCOO: Hierarchical Storage of Sparse Tensors. Jiajia Li, Jimeng Sun, Richard Vuduc. ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2018. [pdf] [bib] Winner, Best Student Paper
Scalable Tensor Decompositions in High Performance Computing Environments. Jiajia Li. PhD Dissertation. Georgia Institute of Technology, Atlanta, GA, USA. July 2018. [pdf] [bib] SMARTech @ GT
Optimizing Sparse Tensor Times Matrix on GPUs. Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, Richard Vuduc. Journal of Parallel and Distributed Computing (Special Issue on Systems for Learning, Inferencing, and Discovering). [pdf] [bib]
Optimizing Sparse Tensor Times Matrix on multi-core and many-core architectures. Jiajia Li, Yuchen Ma, Chenggang Yan, Richard Vuduc. The sixth Workshop on Irregular Applications: Architectures and Algorithms (IA^3), co-located with SC’16. 2016. [pdf]
ParTI!: a Parallel Tensor Infrastructure for Data Analysis. Jiajia Li, Yuchen Ma, Chenggang Yan, Jimeng Sun, Richard Vuduc. Tensor-Learn Workshop @ NIPS'16. [pdf]

Citation

@misc{parti,
author="Jiajia Li and Yuchen Ma and Richard Vuduc",
title="{ParTI!} : A Parallel Tensor Infrastructure for multicore CPUs and GPUs",
month="Oct",
year="2018",
url="http://parti-project.org",
note="Last updated: Jan 2020"
}

Primary Developers

Jiajia Li (Contact: jiajia.li@pnnl.gov)
Yuchen Ma (Contact: m13253@hotmail.com)

License

ParTI! is free software: you can redistribute it and/or modify it under the terms of the BSD 3-Clause License.

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1533768. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. For NSF required reporting elements, please see this page.

Name		Name	Last commit message	Last commit date
Latest commit History 883 Commits
cmake-modules		cmake-modules
docs		docs
examples		examples
extra		extra
include		include
matlab		matlab
src		src
tensors		tensors
tests		tests
tools		tools
.clang_complete		.clang_complete
.gitignore		.gitignore
CHANGES.md		CHANGES.md
CMakeLists.txt		CMakeLists.txt
Contributing.md		Contributing.md
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
README.md		README.md
build-sample.config		build-sample.config
build.sh		build.sh
test.sh		test.sh
todo.md		todo.md
valgrind.supp		valgrind.supp

License

hpcgarage/ParTI

Folders and files

Latest commit

History

Repository files navigation

ParTI!

Supported sparse tensor operations:

Supported sparse tensor formats:

Build requirements:

Build:

Build MATLAB interface (Not ready for new functions):

Build docs:

Run examples:

Publications

Citation

Primary Developers

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages