Sparse Boolean Linear Algebra Libraries Benchmark

Performance evaluation for sparse linear algebra libraries for CPU and GPU targets. This evaluation attempts to measure execution time of common and most popular sparse matrix operations x and + over boolean semiring.

The purpose of the evaluation is to show the performance gain between the Boolean optimized and general-purpose operations. The comparison is not entirely fair, but the Boolean optimized libraries for GPU have not been introduced yet.

Note: some of these tested libraries does not provide extra optimizations for the boolean-values operations. Thus, these operations are imitated by the provided primitives, where non-zero (floating point type) values are interpreted as true.

Tested libraries and frameworks

Library name	Target platform	Technology	Operation `x`	Operation `+`
cuBool	GPU	Nvidia Cuda	yes	yes
clBool	GPU	OpenCL	yes	yes
CUSP	GPU	Nvidia Cuda	yes	yes
cuSPARSE	GPU	Nvidia Cuda	yes	yes
clSPARSE	GPU	OpenCL	yes	no
SuiteSparse	CPU	CPU	yes	yes

Getting started

Requirements

GCC Compiler
CMake 3.15 (or greater)
Cuda Toolkit
OpenCL Dev-Package
SuiteSparse

Get source code

Get the source code and go into project directory.

$ git clone https://github.com/EgorOrachyov/SpBench
$ cd SpBench
$ git submodule update --init --recursive

Download data

The dataset for benchmarks is available at Google Drive by the following access link. Dataset is a collection of several square sparse matrices in the .mtx format. These matrices represent various network graphs, which can be used for benchmarking x and + operations, what makes sense.

The dataset is stored as two .zip archives dataset0.zip and dataset1.zip. Download these archives and place it in the repository root into dataset folder. This folder must be created manually.

After that, the path to the archives must be SpBench/dataset/dataset0.zip and SpBench/dataset/dataset1.zip respectively. These archives will be automatically extracted and placed into build directory in time of the build process.

Build

Create build directory and go into it. The example demonstrates, how to build benchmarks in release mode.

$ mkdir build-release
$ cd build-release

Configure build and run actual compilation process.

$ cmake .. -DCMAKE_BUILD_TYPE=Release
$ cmake --build . --target all -j `nproc`

After compilation process benchmark executables will be stored in the build directory. The naming follows the next pattern: {tool-name}_{operation name}.

Generate Data

For benchmarking operation R = A + A^2 we need to generate A^2 matrix for matrix entry in dataset. Matrix A^2 is generated by cuBool library based application. In order to run generation, execute the following script snippet inside build directory:

$ ./cubool_prepare_data data/config_gen_m2.txt

This will generate required A^2 matrices and store it inside the same data folders, as original A matrices with extension suffix .mtx2 instead of .mtx. These matrices are automatically loaded inside benchmarks by MatrixLoader2 class, which automatically appends 2 to the name of the loaded target A matrix and loads A^2 matrix.

Benchmark execution

In order to run benchmark for all tested targets execute the following script snippet inside build directory:

$ bash run_all.sh
$ bash summarize.sh

This will output Summary.txt file with benchmark stats.

Memory profiling

In order to get the peak GPU memory usage, first of all, we need to collect the GPU memory usage with time step in 1 ms. To do this, run in the separate terminal the following command:

$ nvidia-smi --query-gpu=timestamp,pstate,memory.total,memory.used --format=csv -lms 1 -f Profiling.csv

This will run the nvidia-smi built-in memory profiler, which will sample the memory usage stat with 1 ms time step, and save this information into table Profiling.csv.

In the main terminal execute the following command, to run actual libraries benchmarks (since we want to obtain their peak memory usage):

$ bash run_profiling.sh

When the last library finishes its benchmark, go into separate terminal and explicitly terminate the memory capturing process by ctrl + C combination.

After that in your build directory two files will be located: Profiling.csv with memory usage and Profiling-Time.txt with time stamps (begin and end of each experiment). In order to extract actual peak memory usage, we will have to run python script scripts/extract_mem_profile.py. In order to do that, copy this script inside build directory and run it.

When script finises its job, it will output file Profiling-Stats.txt with peak memory usage for each library benchmark and for each matrix entry in dataset.

Dataset

The matrix data is selected from The SuiteSparse Matrix Collection (formerly the University of Florida Sparse Matrix Collection) link.

№	Matrix name	# Rows	Nnz M	Nnz/row	Max Nnz/row	Nnz M^2	Nnz M + M^2
0	DIMACS10/wing	62,032	243,088	3.9	4	714,200	917,178
1	DIMACS10/luxembourg_osm	114,599	239,332	2.0	6	393,261	632,185
2	SNAP/amazon0312	400,727	3,200,440	7.9	10	14,390,544	14,968,909
3	LAW/amazon-2008	735,323	5,158,388	7.0	10	25,366,745	26,402,678
4	SNAP/web-Google	916,428	5,105,039	5.5	456	29,710,164	30,811,855
5	SNAP/roadNet-PA	1,090,920	3,083,796	2.8	9	7,238,920	9,931,528
6	SNAP/roadNet-TX	1,393,383	3,843,320	2.7	12	8,903,897	12,264,987
7	DIMACS10/belgium_osm	1,441,295	3,099,940	2.1	10	5,323,073	8,408,599
8	SNAP/roadNet-CA	1,971,281	5,533,214	2.8	12	12,908,450	17,743,342
9	DIMACS10/netherlands_osm	2,216,688	4,882,476	2.2	7	8,755,758	13,626,132

Results

For performance evaluation, PC with Ubuntu 20.04 installed was used. It had Intel core i7-6700 CPU, 3.40GHz, DDR4 64Gb RAM and GeForce GTX 1070 GPU with 8Gb VRAM. Only the execution time of the operations themselves was measured. The actual data were assumed to be loaded into the VRAM or RAM respectively in the appropriate format, required for the target tested framework.

Matrix-matrix multiplication evaluation results.
Time in milliseconds, Memory in megabytes.

№	cuBool	clBool	CUSP	cuSPARSE	clSPARSE	SuiteSparse
M	Time&Mem	Time&Mem	Time&Mem	Time&Mem	Time&Mem	Time&Mem
0	1.9 93	1.9 89	5.2 125	20.1 155	4.9 105	7.9 22
1	2.4 91	2.0 89	3.7 111	1.7 151	6.9 97	3.1 169
2	23.2 165	55.5 163	108.5 897	412.8 301	52.2 459	257.6 283
3	33.3 225	82.1 221	172.0 1409	184.8 407	77.4 701	369.5 319
4	41.8 241	127.6 239	246.2 1717	4761.3 439	207.5 1085	673.3 318
5	18.1 157	14.2 153	42.1 481	37.5 247	56.6 283	66.6 294
6	22.6 167	16.9 165	53.1 581	46.7 271	70.4 329	80.7 328
7	23.2 151	16.9 159	32.9 397	26.7 235	68.2 259	56.9 302
8	32.0 199	23.4 211	74.4 771	65.8 325	98.2 433	114.5 344
9	35.3 191	24.9 189	51.0 585	51.4 291	102.8 361	90.9 311

Element-wise matrix-matrix addition evaluation results.
Time in milliseconds, Memory in megabytes.

№	cuBool	clBool	CUSP	cuSPARSE	clSPARSE	SuiteSparse
M	Time&Mem	Time&Mem	Time&Mem	Time&Mem	Time&Mem	Time&Mem
0	1.1 95	1.9 105	1.4 105	2.4 163	- -	2.3 176
1	1.7 95	1.6 109	1.0 97	0.8 151	- -	1.6 174
2	11.4 221	23.8 543	16.2 455	24.3 405	- -	37.2 297
3	17.5 323	35.4 877	29.5 723	27.2 595	- -	64.8 319
4	24.8 355	43.1 989	31.9 815	89.0 659	- -	77.5 318
5	16.9 189	12.5 359	11.2 329	11.6 317	- -	36.6 287
6	19.6 209	15.4 429	14.5 385	16.9 357	- -	45.3 319
7	19.5 179	10.5 321	10.2 303	10.5 297	- -	28.5 302
8	30.5 259	22.4 579	19.4 513	20.2 447	- -	65.2 331
9	30.1 233	18.6 457	14.8 423	18.3 385	- -	50.2 311

Known issues

clSPARSE based matrix-matrix multiplication benchmark crashes in Release Mode with error: terminate called after throwing an instance of 'std::bad_cast' what(): std::bad_cast

License

This project is licensed under MIT license. Full license text can be found in the license file. For the detailed license information of the dependencies refer to these third party projects official web pages.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
scripts		scripts
src		src
thirdparty		thirdparty
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE.md		LICENSE.md
README.md		README.md
run.sh		run.sh
run_all.sh		run_all.sh
run_cpu.sh		run_cpu.sh
run_cuda.sh		run_cuda.sh
run_opencl.sh		run_opencl.sh
run_profiling.sh		run_profiling.sh
run_profiling_add.sh		run_profiling_add.sh
summarize.sh		summarize.sh
summary_clear.sh		summary_clear.sh

License

EgorOrachyov/SpBench

Folders and files

Latest commit

History

Repository files navigation

Sparse Boolean Linear Algebra Libraries Benchmark

Tested libraries and frameworks

Getting started

Requirements

Get source code

Download data

Build

Generate Data

Benchmark execution

Memory profiling

Dataset

Results

Known issues

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages