GPUHZGSVD

The Hari–Zimmermann generalized SVD for CUDA.

A part of the supplementary material for the paper doi:10.1177/1094342020972772 (arXiv:1909.00101 [math.NA]).

Building

Prerequisites

A reasonably recent (e.g., 10.1.243) full CUDA installation on a 64-bit Linux (e.g., CentOS 7.7, optionally with devtoolset-8) is required.

For the Level 3 (multi-GPU) version an MPI installation on Linux built with the CUDA support (e.g., Open MPI) is required.

Then, clone and build JACSD repository, with the same parent directory as this one. In fact, only the jstrat library (i.e., libjstrat.a) is needed to be built there.

Make options

To build the test executable in double precision, do the following:

cd src
./mk.sh D SM OPT CVG

or, for double complex,

cd src
./mk.sh Z SM OPT CVG

where SM is the target GPU architecture (e.g., for a Maxwell card it might be 52, for a Volta one 70, etc.), OPT is the optimization level (3 should be fine), and CVG is the algorithm requested (0, 1, 2, 3, 4, 5, 6, or 7).

It is also possible to append clean to the invocation above, to remove the executable, or such cleanup can be done manually.

For the Level 3 (multi-GPU) version, the prefix (e.g., /usr/local) of your MPI distribution has to be provided:

cd src
./mk.sh Z SM OPT CVG MPI=prefix

Please, adjust the compiling and linking flags in the makefile(s) for your particular MPI distribution, since the flags provided therein have been tailored for Open MPI!

Execution

Command line

To run the executable, say, e.g.

/path/to/HZ0.exe DEV SNP0 SNP1 ALG MF MG N FN

where DEV is the CUDA device number, SNP0 is the inner and SNP1 outer strategy ID (2 for cycwor or 4 for mmstep), ALG is 0 for full block or 8 for block-oriented, MF and MG are the number of rows of the first and the second matrix, respectively, N is the number of columns, and FN is the file name prefix (without an extension) containing the input data.

The Level 3 (multi-GPU) executables require a similar invocation:

/path/to/MHZ0.exe SNP0 SNP1 SNP2 ALG MF MG N FN

where SNP2 is the outermost strategy ID (3 for cycwor or 5 for mmstep; notice the increments), while the executable itself has to be run with at least two processes using mpiexec or a similar MPI job launcher.

Data format

Data should be contained in FN.Y and FN.W binary, Fortran-array-order files, where the first one stores the matrix F and the second one the matrix G, and both matrices are either double or double complex and are expected to have MF (first matrix) or MG (second matrix) rows and N columns.

The output comprises FN.YU, FN.WV, FN.Z, for the double or double complex matrices U (MF x N), V (MG x N), and Z (N x N); and FN.SY, FN.SW, FN.SS, for the double vectors \Sigma_F, \Sigma_G, and \Sigma, respectively, where all vectors are of length N.

See also FLAPWxHZ repository for more explanation.

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
ZHZ-0		ZHZ-0
src		src
var		var
.gitattributes		.gitattributes
.gitignore		.gitignore
GPUHZGSVD.pdf		GPUHZGSVD.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZHZ-0

ZHZ-0

src

src

var

var

.gitattributes

.gitattributes

.gitignore

.gitignore

GPUHZGSVD.pdf

GPUHZGSVD.pdf

LICENSE

LICENSE

README.md

README.md

Repository files navigation

GPUHZGSVD

Building

Prerequisites

Make options

Execution

Command line

Data format

About

Releases

Languages

License

venovako/GPUHZGSVD

Folders and files

Latest commit

History

Repository files navigation

GPUHZGSVD

Building

Prerequisites

Make options

Execution

Command line

Data format

About

Topics

Resources

License

Stars

Watchers

Forks

Languages