ARM SVE Benchmarks

This repository gathers some small kernels to benchmark the performance of hand-written SVE code compared to compiler-generated one. Implemented kernels are:

Initialization (store);
Copy (load, store);
Reduction (load, add);
Dot product (load, load, mul, add);
DAXPY (load, load, load, mul, add, store);
Vector sum (load, load, load, add, store);
Vector scale (load, load, load, mul, store).

Usage

Note: the provided Makefile uses the armclang compiler, however both clang and gcc have been tested and can be used as well. Keep in mind that the architecture specific flags (AFLAGS) might need to be changed depending on the chosen compiler. See the comparison between compiler flags across architectures for more information.

To build the benchmarks:

make build

You can then execute one of the benchmarks presented above and specify the vectors' size (in bytes), number of iterations and error tolerance through the provided option flags.

Example (reduction benchmark with 64KiB vectors, 100k iterations and an error tolerance of $10^{-14}$:

target/arm_bench -k reduc -s 8192 -i 100000 -e 1e-14

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
include		include
src		src
.clang-format		.clang-format
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include

include

src

src

.clang-format

.clang-format

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Repository files navigation

ARM SVE Benchmarks

Usage

About

Languages

License

dssgabriel/arm-sve-benchmarks

Folders and files

Latest commit

History

Repository files navigation

ARM SVE Benchmarks

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages