jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method.
GPU or Xeon Phi can be used.

how to run:

# for intel (intel compiler and intelmpi are required)  
$ make mic  

# for GPU (PGI compiler and its MPI are required)  
$ make gpu  

$ cat fort.18  
10000       ! iter_max  
4096 4096   ! m, n  
1 2         ! np_m, np_n  
$ vi fort.18 <-- adjust the paramters  

$ mpirun -np $NP ./jacobi_mpi.intel   # for Xeon phi  
or  
$ mpirun -np $NP ./jacobi_acc_mpi.pgi # for GPU
# where $NP = np_m * np_n

# if you do not have GPU nor Xeon Phi,
$ mpirun -np $NP ./jacobi_mpi.intel.nooffload
or
$ mpirun -np $NP ./jacobi_mpi.pgi

# to view the result
$ gnuplot # splot "jacobi.dat" w l

performance comparison:

iter_max=10000
m=4096, n=4096

with Xeon E5-2680(8 cores/socket, 2 sockets) + Xeon Phi 5110P x2, 2 nodes (np_m=1, np_n=4):

only CPU[s]: 66.7373
CPU + MIC[s]: 29.1181

with Xeon E5-2680 v4(14 cores/socket, 2 sockets) + NVIDIA Tesla P100 x4, 2 nodes (np_m=1, np_n=8):

only CPU[s]: 94.7290
CPU + GPU[s]: 5.29180

the following is an example of calculation result.

m=1024 n=1024
with GPU:

$ mpirun -x LD_LIBRARY_PATH -np 8 -npernode 4 -bind-to socket ./jacobi_acc_mpi.pgi
 iter_max:        10000
 m:         4096 n:         4096
 np:            8
 np_m:            1 np_n:            8
    0  0.499878
 1000  0.000357
 2000  0.000178
 3000  0.000118
 4000  0.000088
 5000  0.000071
 6000  0.000059
 7000  0.000050
 8000  0.000044
 9000  0.000039
10000  0.000035
Total CPU time:    5.29180
Device init   :    1.91826
Setup problem :   0.841110E-01
Data copyin   :   0.291271E-01
Computation   :    3.13129
Data copyout  :   0.277512E-01
Output        :   0.101262

with Xeon Phi:

$ srun --mpi=pmi2 -N2 -n4 -c8 --cpu_bind=cores -t0:05:00 ./jacobi_mpi.intel
 iter_max:       10000
 m:        1024 n:        1024
 np:           4
 np_m:           1 np_n:           4
 
Number of threads = 236
    0  0.499512
 1000  0.000347
 2000  0.000171
 3000  0.000113
 4000  0.000084
 5000  0.000066
 6000  0.000055
 7000  0.000047
 8000  0.000041
 9000  0.000036
10000  0.000032
Total CPU time:    6.98289
Device init   :   0.400000E-05
Setup problem :   0.640400E-02
Data copyin   :   0.208880E-01
Computation   :    6.82102
Data copyout  :   0.292500E-02
Output        :   0.131650

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Makefile		Makefile
README.md		README.md
TODO		TODO
fort.18		fort.18
jacobi.f90		jacobi.f90
jacobi.gpu.gif		jacobi.gpu.gif
jacobi.mic.gif		jacobi.mic.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makefile

Makefile

README.md

README.md

TODO

TODO

fort.18

fort.18

jacobi.f90

jacobi.f90

jacobi.gpu.gif

jacobi.gpu.gif

jacobi.mic.gif

jacobi.mic.gif

Repository files navigation

jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method.
GPU or Xeon Phi can be used.

performance comparison:

with Xeon E5-2680(8 cores/socket, 2 sockets) + Xeon Phi 5110P x2, 2 nodes (np_m=1, np_n=4):

with Xeon E5-2680 v4(14 cores/socket, 2 sockets) + NVIDIA Tesla P100 x4, 2 nodes (np_m=1, np_n=8):

the following is an example of calculation result.

About

Releases

Packages

Languages

dc-fukuoka/jacobi

Folders and files

Latest commit

History

Repository files navigation

jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method. GPU or Xeon Phi can be used.

performance comparison:

with Xeon E5-2680(8 cores/socket, 2 sockets) + Xeon Phi 5110P x2, 2 nodes (np_m=1, np_n=4):

with Xeon E5-2680 v4(14 cores/socket, 2 sockets) + NVIDIA Tesla P100 x4, 2 nodes (np_m=1, np_n=8):

the following is an example of calculation result.

About

Topics

Resources

Stars

Watchers

Forks

Languages

jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method.
GPU or Xeon Phi can be used.