项目作者: dc-fukuoka

项目描述 :
Mandelbrot set by MPI/OpenMP/OpenACC.
高级语言: Fortran
项目地址: git://github.com/dc-fukuoka/mandelbrot.git
创建时间: 2018-03-13T05:24:22Z
项目社区:https://github.com/dc-fukuoka/mandelbrot

开源协议:

下载


Mandelbrot set by MPI/OpenMP/OpenACC.

requirements for python:

  • numpy
  • scipy
  • matplotlib
  • f90nml

how to run:

  • for OpenMP version
    1. $ export FC=ifort
    2. $ make a.out
    3. $ vi fort.11 # adjust the parameters
    4. $ ./a.out
    5. maximum iteration: 200
    6. imax: 301 jmax: 251
    7. time[s]: 1.210000000000000E-002
    8. ./draw.py
  • for MPI version
    1. $ export MPIFC=mpiifort
    2. $ make a.out.mpi
    3. $ vi fort.11 # adjust the parameters
    4. $ mpirun -np $NP ./a.out.mpi # where $NP must equal to np_i*np_j in fort.11
    5. maximum iteration: 200
    6. imax: 301 jmax: 251
    7. time[s]: 7.543087005615234E-003
    8. ./draw_mpi.py
  • for OpenACC version
    1. $ export MPIFC=mpif90
    2. $ make a.out.mpi.acc
    3. $ vi fort.11 # adjust the parameters
    4. $ mpirun -np $NP ./a.out.mpi # where $NP must equal to np_i*np_j in fort.11
    5. maximum iteration: 200
    6. imax: 301 jmax: 251
    7. time[s]: 7.543087005615234E-003
    8. ./draw_mpi.py

to view the graph

$ display mandelbrot.png
or
$ display mandelbrot_mpi.png
Alt text

or

  1. $ gnuplot
  2. gnuplot> set pm3d map
  3. gnuplot> splot "fort.100"

MPI version can not use gnuplot because the output is a binary.

  • zoomed calculation
    1. ms
    2. iter_max = 200
    3. dx = 5.0d-4
    4. dy = 5.0d-4
    5. x_min = -1.2d0
    6. x_max = -1.1d0
    7. y_min = 0.2d0
    8. y_max = 0.3d0
    9. tol = 1.0d2
    10. /
    Alt text

performance comparison

CPU: Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 8 cores/socket, 2 sockets, 8 nodes
interconnect: 4xFDR infiniband, fat tree
compiler: intel compiler 2018u0
MPI: intel MPI 2018u0

  • input:
    1. ms
    2. iter_max = 200
    3. dx = 2.0d-4
    4. dy = 2.0d-4
    5. x_min = -2.25d0
    6. x_max = 0.75d0
    7. y_min = -1.25d0
    8. y_max = 1.25d0
    9. tol = 1.0d2
    10. /
  • serial(1 core)
    1. $ OMP_NUM_THREADS=1 KMP_AFFINITY=compact srun --mpi=pmi2 -N1 -n1 -c1 --cpu_bind=cores -m block:block ./a.out
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 104.264600000000
  • OpenMP(1 node, 16 cores)
    1. $ OMP_NUM_THREADS=16 KMP_AFFINITY=compact srun -n1 -c16 --cpu_bind=cores -m block:block ./a.out
    2. srun: Warning: can't run 1 processes on 8 nodes, setting nnodes to 1
    3. maximum iteration: 200
    4. imax: 15001 jmax: 12501
    5. time[s]: 15.1660000000000
  • flat MPI(1 node, 16 cores, np_i=4, np_j=4, 1 thread/process)
    1. $ I_MPI_EXTRA_FILESYSTEM=1 I_MPI_EXTRA_FILESYSTEM_LIST=lustre OMP_NUM_THREADS=1 KMP_AFFINITY=compact srun --mpi=pmi2 -N1 -n16 -c1 --cpu_bind=cores -m block:block ./a.out.mpi
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 23.8055260181427
  • hybrid(1 node, 16 cores, np_i=1, np_j=2, 8 threads/process)
    1. $ I_MPI_EXTRA_FILESYSTEM=1 I_MPI_EXTRA_FILESYSTEM_LIST=lustre OMP_NUM_THREADS=8 KMP_AFFINITY=compact srun --mpi=pmi2 -N1 -n2 -c8 --cpu_bind=cores -m block:block ./a.out.mpi
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 15.0692129135132
  • flat MPI(8 nodes, 128 cores, np_i=8, np_j=16, 1 thread/process)
    1. $ I_MPI_EXTRA_FILESYSTEM=1 I_MPI_EXTRA_FILESYSTEM_LIST=lustre OMP_NUM_THREADS=1 KMP_AFFINITY=compact srun --mpi=pmi2 -N8 -n128 -c1 --cpu_bind=cores -m block:block ./a.out.mpi
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 3.35678577423096
  • hybrid(8 nodes, 128 cores, np_i=4, np_j=4, 8 threads/process)

    1. $ I_MPI_EXTRA_FILESYSTEM=1 I_MPI_EXTRA_FILESYSTEM_LIST=lustre OMP_NUM_THREADS=8 KMP_AFFINITY=compact srun --mpi=pmi2 -N8 -n16 -c8 --cpu_bind=cores -m block:block ./a.out.mpi
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 3.36353015899658
  • OpenACC(4 nodes, 16 processes, np_i=4, np_j=4, 1 GPU/process) GPU: Tesla P100 x4/node

    1. $ mpirun -x PATH -x LD_LIBRARY_PATH -np 16 -npernode 4 ./a.out.mpi.acc
    2. maximum iteration: 200
    3. imax: 15001 jmax: 12501
    4. time[s]: 9.6851900219917297E-002