Parallel matrix multiplication

Given two $n$-dimensional square matrices $A, B \in\mathcal{M}_{n}(\mathbb{R})$ defined on the real field, it is know that the matrix multiplication $C=AB\in\mathcal{M}_n(\mathbb{R})$ is defined as

$$ [AB]_{i, j}=\sum_{r=1}^na_{i,r}b_{r, j} $$

and, a naive approach to implement such an operation in the programming language C (in which multidimensional arrays are stored row-major order), is the following:

for(i = 0; i < n; i++)
  for(j = 0; j < n; j++)
    for(k = 0; k < n; k++)
      C[i][j] += A[i][k] + B[k][j];

which is an example of a serial code.

Here, my goal is to implement a parallel code in C to perform such an operation given any number $m$ of computational units.

Important note: the code has been written to compile and run on the Marconi100 cluster at CINECA. To compile and run the Spectum_MPI, the cuda and the openablas modules must be loaded.

Compilation

To compile it is possible to use the command

make [version]

where [version] can be either blank, dgemm or cuda. This will produce the [version]multiplication.x executable (the name will depend on which version it has been compiled).

Execution

The executable will generate two $n\times n$ matrices with random entries and multiply them. To run, for example with 3 processes using $16\times 16$ matrices, it is possible either to use mpirun -np 3 ./[version]multiplication.x 16 or to use

make [version]run prc=3 dim=16

where [version] can be either blank, dgemm or cuda. This will produce the [version]multiplication.x executable (the name will depend on how it has been compiled) and run it.

Test

To test it is possible to pass the debug=yes flag to the Makefile

make [version] debug=yes

and then run the [version]debug_multiplication.x executable using mpirun. It is also supported the command:

make [version]run debug=yes

where [version] can be either blank, dgemm or cuda. This will compile (if necessary) and run immediately after.

To do list

These are the things done or to be done:

Implement a working code using only MPI

Implement a working code when $n$ multiple of $m$
Add some testing
Implement a working code when $n$ generic
Measure performances

Include a version using cblas_dgemm instead of the serial multiplication done by each MPI process

Make it work
Make some plots to compare performances with serial version

Port on GPU

Include a version using cublasDgemm instead of the serial multiplication done by each MPI process
Make some plots to compare performances with serial and cblas_dgemm versions

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
analysis		analysis
data		data
include		include
scripts		scripts
src		src
Makefile		Makefile
README.md		README.md
multiplication.c		multiplication.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

data

data

include

include

scripts

scripts

src

src

Makefile

Makefile

README.md

README.md

multiplication.c

multiplication.c

Repository files navigation

Parallel matrix multiplication

Compilation

Execution

Test

To do list

About

Releases

Packages

Languages

WalterNadalin/ParallelMatrixMultiplication

Folders and files

Latest commit

History

Repository files navigation

Parallel matrix multiplication

Compilation

Execution

Test

To do list

About

Topics

Resources

Stars

Watchers

Forks

Languages