Loop Fission Benchmarks

This repository contains the source code for benchmarking the loop fission algorithm presented in "Distributing and Parallelizing Non-canonical Loops".

This is a curated benchmark suite. The included benchmarks have been selected specifically for offering opportunity to perform loop fission transformation. For each benchmark, we measure difference in clock time, after loop fission and parallelization using OpenMP.

We compare the original baseline programs to those generated by our loop fission technique. We also compare our technique to an alternative loop fission technique implemented in ROSE compiler's LoopProcessor tool.

Getting started

For an instructed walk-through and reproduction of paper experiments, we recommend accessing the artifact published on zenodo and following the readme instructions.
For running experiments on native hardware or extending this source code, clone this repository.

Files and organization

Benchmark directories

Directory	Description
`original`	baseline, unmodified programs
`fission`	transformed and parallelized programs, using our method
`alt`	transformed and parallelized programs, using alternative method

A more detailed explanation of benchmark transformations

In this repository, transformations of benchmarks have already been applied in the appropriate directories, fission and alt, and we measure differences between those transformations and original. This section explains briefly how the transformations are generated in the first place.

Fission benchmarks have been transformed manually following our loop fission algorithm. These transformations are not reproducible mechanically. The programs have been parallelized by hand. We use manual approach here because automatic parallelization of while loops is not supported by any tool we are aware of. Programs with for loops could be parallelized automatically, and as shown in the next case.

Alt benchmarks are the results of applying automatic transformation using ROSE compiler. They are also automatically parallelized using the same tool. Benchmark remap fails during transformation, and we measure no difference between the original version. Because the tool transforms all code, including the timing code of the benchmark template, as last step we restore the original benchmark template. Detailed steps for re-generating the alt-benchmarks are here.

Benchmark descriptions

All programs are written in C language. We include programs with both for and while loops.

Benchmark	Description	for	while	Origin
`3mm`	3D matrix multiplication	✔		PolyBench/C
`bicg`	BiCG sub kernel of BiCGStab linear solver		✔	PolyBench/C
`colormap`	TIFF image conversion of photometric palette		✔	MiBench
`conjgrad`	Conjugate gradient routine	✔		NAS-CG
`cp50`	Ghostscript/CP50 color print routine	✔	✔	MiBench
`deriche`	Edge detection filter	✔		PolyBench/C
`fdtd-2d`	2-D finite different time domain kernel	✔		PolyBench/C
`gemm`	Matrix-multiply C=alpha.A.B+beta.C	✔		PolyBench/C
`gesummv`	Scalar, vector and matrix multiplication		✔	PolyBench/C
`mvt`	Matrix vector product and transpose		✔	PolyBench/C
`remap`	4D matrix memory remapping	✔		NAS-UA
`tblshift`	TIFF PixarLog compression main table bit shift	✔	✔	MiBench

Other directories and files

headers/ header files for benchmark programs.
utilities/ e.g. the timing script, obtained from PolyBench/C benchmark suite version 4.2.
plot.py is used for generating tables and plots from results.
run.sh is a wrapper for the timing script; it enables benchmarking directories.
ref_eval referential result; the evaluation on which we base results of the paper.

Reproducing results

System requirements: Linux/OSX host, C compiler with OpenMP support, Make, and Python 3.8+ for plotting. The system should have multiple cores for parallelism, but 4 cores is sufficient for this experiment.

Benchmarking proceeds in two phases: first capture the timing results, then generate plots and tables from those results. Details of these steps follow next.

Running the benchmarks

✳️ System should include a C compiler that supports OpenMP pragmas (defaults to GCC).

To specify an alternative compiler, append to the make commands CC=[compiler_name].

Small evaluation — time partial benchmarks — ⏲️ ~ 10 min.

make small

Run all benchmarks — time execution of all benchmarks — ⏲️ ~ 2-4 h, by OS.

make all

Run specific benchmark — ⏲️ 1-60 min, varies by benchmark and OS.

make [benchmark_name]

Custom execution — call the run.sh script directly:

./run.sh

Available arguments for timing

ARGUMENT	DESCRIPTION: options	DEFAULT
`-c`	system compiler to use	`gcc`
`-d`	which directory: `original`, `fission`, `alt`	`original`
`-o`	optimization level: `O0`, `O1`, `O2`, `O3`, ...	`O0`
`-v`	max. variance (%) when timing results: > `0.0`	`5.0`
`-s`	data size: `MINI`, `SMALL`, `MEDIUM`, `LARGE`, `EXTRALARGE`, `STANDARD`	`STANDARD`
`-p`	only specific benchmark: `3mm`, `bicg`, `deriche` ...	not set

If necessary, change permissions: chmod u+r+x ./run.sh.

Locating and interpreting results

The results can be found in eval/results directory, categorized by source directory name, optimization level and data size. Two files will be generated for each category, for each run:

[args]_model.txt - machine + processor snapshot, meta data
[args].txt - actual results of timing.

Data labels, in order:

program: name of evaluated program
variance (%): variance of recorded execution times
time (s): average runtime (clock time), in seconds
UTC timestamp: time at completion, in seconds since epoch

Data labels are always the same. They are not included in timing results file.

Timing options are same as default:

perform 5 executions/program
take average of 3 runs (min and max time are excluded)
variance controls the %-difference allowed between the 3 median runs

Generating plots and tables

After capturing results, use the plotting script to generate tables or graphs. This step requires Python version 3.8 or higher.

Install dependencies

python3 -m pip install -q -r requirements.txt

Generate tables or plots
```
make plots
```

To customize plot options, call the plot.py directly with selected arguments.

Available arguments for plotting

python3 plot.py --help

ARGUMENT	DESCRIPTION : options	DEFAULT
`--data`	data choice: `time`, `speedup`	`time`
`--input`	path to results (input) directory	`eval/results`
`--out`	path to output directory	`eval/plots`
`--fmt`	output format: `tex`, `md`, `plot`	`md`
`--digits`	number of digits for tabular values: `0`...`15`	`6`
`--ss`	source directory for calculating speedup	`original`
`--st`	target directory for calculating speedup (all when not set)	not set
`--millis`	display table of times in milliseconds (otherwise in seconds)	not set
`--show`	show generated plot or table	not set
`--dir_filter`	include directories (comma-separated list): `original`, `fission`, `alt`	not set
`--prog_filter`	include benchmarks (comma-separated list): `3mm`, `bicg`, `deriche` ...	not set
`--help`	show help message and exit	not set

e.g. to generate a text-based speedup table, and display it, run:

python3 plot.py -d speedup -f md --digits 2  --show

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
alt		alt
fission		fission
headers		headers
original		original
ref_eval		ref_eval
utilities		utilities
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
plot.py		plot.py
requirements.txt		requirements.txt
run.sh		run.sh

License

statycc/loop-fission

Folders and files

Latest commit

History

Repository files navigation

Loop Fission Benchmarks

Files and organization

Benchmark directories

Benchmark descriptions

Other directories and files

Reproducing results

Running the benchmarks

Locating and interpreting results

Generating plots and tables

About

Topics

Resources

License

Stars

Watchers

Forks

Languages