/home/runner/simulations/TestJob01_temp_2/output-0000/TEST/sim/MemSpeed
Preparing:
+ set -e
+ cd output-0000-active
+ echo Checking:
Checking:
+ pwd
/home/runner/simulations/TestJob01_temp_2/output-0000/TEST/sim/MemSpeed/output-0000-active
+ hostname
fv-az91-181
+ date
Wed Dec 22 20:13:44 UTC 2021
+ echo Environment:
Environment:
+ export CACTUS_NUM_PROCS=2
+ export CACTUS_NUM_THREADS=1
+ export GMON_OUT_PREFIX=gmon.out
+ export OMP_NUM_THREADS=1
+ env
+ sort
+ echo Starting:
Starting:
+ date +%s
+ export CACTUS_STARTTIME=1640204024
+ [ 2 = 1 ]
+ mpirun -np 2 /home/runner/simulations/TestJob01_temp_2/SIMFACTORY/exe/cactus_sim -L 3 /home/runner/simulations/TestJob01_temp_2/output-0000/arrangements/CactusUtils/MemSpeed/test/memspeed.par
INFO (Cactus): Increased logging level from 0 to 3
--------------------------------------------------------------------------------

       10                                  
  1   0101       ************************  
  01  1010 10      The Cactus Code V4.11.0    
 1010 1101 011      www.cactuscode.org     
  1001 100101    ************************  
    00010101                               
     100011     (c) Copyright The Authors  
      0100      GNU Licensed. No Warranty  
      0101                                 
--------------------------------------------------------------------------------

Cactus version:    4.11.0
Compile date:      Dec 22 2021 (19:48:03)
Run date:          Dec 22 2021 (20:13:45+0000)
Run host:          fv-az91-181.1u4dveb1r3ruphbftqvtwhitpc.gx.internal.cloudapp.net (pid=106572)
Working directory: /home/runner/simulations/TestJob01_temp_2/output-0000/TEST/sim/MemSpeed
Executable:        /home/runner/simulations/TestJob01_temp_2/SIMFACTORY/exe/cactus_sim
Parameter file:    /home/runner/simulations/TestJob01_temp_2/output-0000/arrangements/CactusUtils/MemSpeed/test/memspeed.par
--------------------------------------------------------------------------------

Activating thorn Cactus...Success -> active implementation Cactus
Activation requested for 
--->hwloc MemSpeed SystemTopology CartGrid3D CoordBase IOASCII IOUtil PUGH PUGHSlab<---
Thorn hwloc requests automatic activation of zlib
Thorn MemSpeed requests automatic activation of Vectors
Thorn MemSpeed requests automatic activation of MPI
Activating thorn CartGrid3D...Success -> active implementation grid
Activating thorn CoordBase...Success -> active implementation CoordBase
Activating thorn hwloc...Success -> active implementation hwloc
Activating thorn IOASCII...Success -> active implementation IOASCII
Activating thorn IOUtil...Success -> active implementation IO
Activating thorn MemSpeed...Success -> active implementation MemSpeed
Activating thorn MPI...Success -> active implementation MPI
Activating thorn PUGH...Success -> active implementation Driver
Activating thorn PUGHSlab...Success -> active implementation Hyperslab
Activating thorn SystemTopology...Success -> active implementation SystemTopology
Activating thorn Vectors...Success -> active implementation Vectors
Activating thorn zlib...Success -> active implementation zlib
--------------------------------------------------------------------------------
  if (recover initial data)
    Recover parameters
  endif

  Startup routines
    [CCTK_STARTUP]
      CartGrid3D::SymmetryStartup: Register GH Extension for GridSymmetry
      CoordBase::CoordBase_Startup: Register a GH extension to store the coordinate system handles
      GROUP hwloc_startup: hwloc startup group
        hwloc::hwloc_version: Output hwloc version
      SystemTopology::ST_system_topology: Output and/or modify system topology and hardware locality
      PUGH::Driver_Startup: Startup routine
      PUGH::PUGH_RegisterPUGHP2LRoutines: Register Physical to Logical process mapping routines
      PUGH::PUGH_RegisterPUGHTopologyRoutines: Register topology generation routines routines
      IOUtil::IOUtil_Startup: Startup routine
      Vectors::Vectors_Startup: Print startup message
      IOASCII::IOASCII_Startup: Startup routine

  Startup routines which need an existing grid hierarchy
    [CCTK_WRAGH]
      CartGrid3D::RegisterCartGrid3DCoords: [meta] Register coordinates for the Cartesian grid
      MemSpeed::MemSpeed_MeasureSpeed: [meta] Measure CPU, memory, cache speeds
  Parameter checking routines
    [CCTK_PARAMCHECK]
      CartGrid3D::ParamCheck_CartGrid3D: Check coordinates for CartGrid3D
      Vectors::Vectors_Test: Run correctness tests.

  Initialisation
    if (NOT (recover initial data AND recovery_mode is 'strict'))
      [CCTK_PREREGRIDINITIAL]
      Set up grid hierarchy
      [CCTK_POSTREGRIDINITIAL]
        CartGrid3D::SpatialCoordinates: Set Coordinates after regridding
      [CCTK_BASEGRID]
        CartGrid3D::SpatialSpacings: Set up ranges for spatial 3D Cartesian coordinates (on all grids)
        CartGrid3D::SpatialCoordinates: Set up spatial 3D Cartesian coordinates on the GH
        IOASCII::IOASCII_Choose1D: Choose 1D output lines
        IOASCII::IOASCII_Choose2D: Choose 2D output planes
        PUGH::PUGH_Report: Report on PUGH set up
      [CCTK_INITIAL]
      [CCTK_POSTINITIAL]
      Initialise finer grids recursively
      Restrict from finer grids
      [CCTK_POSTRESTRICTINITIAL]
      [CCTK_POSTPOSTINITIAL]
      [CCTK_POSTSTEP]
    endif
    if (recover initial data)
      [CCTK_BASEGRID]
        CartGrid3D::SpatialSpacings: Set up ranges for spatial 3D Cartesian coordinates (on all grids)
        CartGrid3D::SpatialCoordinates: Set up spatial 3D Cartesian coordinates on the GH
        IOASCII::IOASCII_Choose1D: Choose 1D output lines
        IOASCII::IOASCII_Choose2D: Choose 2D output planes
        PUGH::PUGH_Report: Report on PUGH set up
      [CCTK_RECOVER_VARIABLES]
      [CCTK_POST_RECOVER_VARIABLES]
    endif
    if (checkpoint initial data)
      [CCTK_CPINITIAL]
    endif
    if (analysis)
      [CCTK_ANALYSIS]
  endif
  Output grid variables

  do loop over timesteps
    [CCTK_PREREGRID]
    Change grid hierarchy
    [CCTK_POSTREGRID]
      CartGrid3D::SpatialCoordinates: Set Coordinates after regridding
    Rotate timelevels
    iteration = iteration+1
    t = t+dt
    [CCTK_PRESTEP]
    [CCTK_EVOL]
    Evolve finer grids recursively
    Restrict from finer grids
    [CCTK_POSTRESTRICT]
    [CCTK_POSTSTEP]
    if (checkpoint)
      [CCTK_CHECKPOINT]
    endif
    if (analysis)
      [CCTK_ANALYSIS]
    endif
    Output grid variables
    enddo

  Termination routines
    [CCTK_TERMINATE]
      PUGH::Driver_Terminate: Termination routine

  Shutdown routines
    [CCTK_SHUTDOWN]

  Routines run after changing the grid hierarchy:
    [CCTK_POSTREGRID]
      CartGrid3D::SpatialCoordinates: Set Coordinates after regridding
--------------------------------------------------------------------------------
INFO (hwloc): library version 2.1.0, API version 0x20100
INFO (SystemTopology): MPI process-to-host mapping:
This is MPI process 0 of 2
MPI hosts:
  0: fv-az91-181
This MPI process runs on host 0 of 1
On this host, this is MPI process 0 of 2
INFO (SystemTopology): Topology support:
Discovery support:
  discovery->pu                            : yes
CPU binding support:
  cpubind->set_thisproc_cpubind            : yes
  cpubind->get_thisproc_cpubind            : yes
  cpubind->set_proc_cpubind                : yes
  cpubind->get_proc_cpubind                : yes
  cpubind->set_thisthread_cpubind          : yes
  cpubind->get_thisthread_cpubind          : yes
  cpubind->set_thread_cpubind              : yes
  cpubind->get_thread_cpubind              : yes
  cpubind->get_thisproc_last_cpu_location  : yes
  cpubind->get_proc_last_cpu_location      : yes
  cpubind->get_thisthread_last_cpu_location: yes
Memory binding support:
  membind->set_thisproc_membind            : no
  membind->get_thisproc_membind            : no
  membind->set_proc_membind                : no
  membind->get_proc_membind                : no
  membind->set_thisthread_membind          : yes
  membind->get_thisthread_membind          : yes
  membind->set_area_membind                : yes
  membind->get_area_membind                : yes
  membind->alloc_membind                   : yes
  membind->firsttouch_membind              : yes
  membind->bind_membind                    : yes
  membind->interleave_membind              : yes
  membind->nexttouch_membind               : no
  membind->migrate_membind                 : yes
INFO (SystemTopology): Hardware objects in this node:
Machine L#0: (P#0, total=7118760KB, DMIProductName="Virtual Machine", DMIProductVersion=7.0, DMIProductUUID=57ef1471-8be7-6345-a468-7660edd82031, DMIBoardVendor="Microsoft Corporation", DMIBoardName="Virtual Machine", DMIBoardVersion=7.0, DMIChassisVendor="Microsoft Corporation", DMIChassisType=3, DMIChassisVersion=7.0, DMIChassisAssetTag=7783-7084-3265-9085-8269-3286-77, DMIBIOSVendor="American Megatrends Inc.", DMIBIOSVersion="090008 ", DMIBIOSDate=12/07/2018, DMISysVendor="Microsoft Corporation", Backend=Linux, LinuxCgroup=/, OSName=Linux, OSRelease=5.11.0-1022-azure, OSVersion="#23~20.04.1-Ubuntu SMP Fri Nov 19 10:20:52 UTC 2021", HostName=fv-az91-181, Architecture=x86_64, hwlocVersion=2.1.0, ProcessName=cactus_sim)
  Package L#0: (P#0, total=7118760KB, CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=106, CPUModel="Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz", CPUStepping=6)
    L3Cache L#0: (P#-1, size=49152KB, linesize=64, ways=12, Inclusive=0)
      L2Cache L#0: (P#-1, size=1280KB, linesize=64, ways=20, Inclusive=0)
        L1dCache L#0: (P#-1, size=48KB, linesize=64, ways=12, Inclusive=0)
          Core L#0: (P#0)
            PU L#0: (P#0)
      L2Cache L#1: (P#-1, size=1280KB, linesize=64, ways=20, Inclusive=0)
        L1dCache L#1: (P#-1, size=48KB, linesize=64, ways=12, Inclusive=0)
          Core L#1: (P#1)
            PU L#1: (P#1)
INFO (SystemTopology): Thread CPU bindings:
  MPI process 0 on host 0 (process 0 of 2 on this host)
    OpenMP thread 0: PU set L#{0} P#{0}
  MPI process 1 on host 0 (process 1 of 2 on this host)
    OpenMP thread 0: PU set L#{1} P#{1}
INFO (SystemTopology): Setting thread CPU bindings:
INFO (SystemTopology): Thread CPU bindings:
  MPI process 0 on host 0 (process 0 of 2 on this host)
    OpenMP thread 0: PU set L#{0} P#{0}
  MPI process 1 on host 0 (process 1 of 2 on this host)
    OpenMP thread 0: PU set L#{1} P#{1}
INFO (SystemTopology): Extracting CPU/cache/memory properties:
  There are 1 PUs per core (aka hardware SMT threads)
  There are 1 threads per core (aka SMT threads used)
  Cache (unknown name) has type "data" depth 1
    size 49152 linesize 64 associativity 12 stride 4096, for 1 PUs
  Cache (unknown name) has type "unified" depth 2
    size 1310720 linesize 64 associativity 20 stride 65536, for 1 PUs
  Cache (unknown name) has type "unified" depth 3
    size 50331648 linesize 64 associativity 12 stride 4194304, for 2 PUs
INFO (PUGH): Using physical to logical mappings: direct
INFO (PUGH): Using topology generator: automatic
INFO (Vectors): Using vector size 2 for architecture SSE2 (64-bit precision)
--------------------------------------------------------------------------------
Driver provided by PUGH
--------------------------------------------------------------------------------

INFO (PUGH): Not setting up a topology for 1 dimensions
INFO (PUGH): Not setting up a topology for 2 dimensions
INFO (PUGH): Setting up a topology for 3 dimensions
INFO (IOASCII): I/O Method 'IOASCII_1D' registered: output of 1D lines of grid functions/arrays to ASCII files
INFO (IOASCII): Periodic 1D output every 1 iterations
INFO (IOASCII): Periodic 1D output requested for 'GRID::r'
INFO (IOASCII): I/O Method 'IOASCII_2D' registered: output of 2D planes of grid functions/arrays to ASCII files
INFO (IOASCII): Periodic 2D output turned off
INFO (IOASCII): I/O Method 'IOASCII_3D' registered: output of 3D grid functions/arrays to ASCII files
INFO (IOASCII): Periodic 3D output turned off
INFO (MemSpeed): Measuring CPU, cache, memory, and communication speeds:
  Single-core measurements (using 1 MPI processes with 1 OpenMP threads each):
    CPU frequency:
      iterations=1000000... time=0.0019154 sec
      iterations=10000000... time=0.0171021 sec
      iterations=100000000... time=0.176168 sec
      iterations=600000000... time=1.04634 sec
      iterations=600000000... time=1.07232 sec
      result: -46.1875 GHz
    CPU floating point performance:
      iterations=1000000... time=0.00361319 sec
      iterations=10000000... time=0.036115 sec
      iterations=100000000... time=0.361451 sec
      iterations=300000000... time=1.08394 sec
      result: 8.85654 Gflop/sec
    CPU integer performance:
      iterations=1000000... time=0.0020099 sec
      iterations=10000000... time=0.0199312 sec
      iterations=100000000... time=0.199324 sec
      iterations=600000000... time=1.1955 sec
      result: 8.03012 Giop/sec
    Read latency of D1 cache (for 1 PUs) (using 1*36864 bytes):
      iterations=1000... time=0.0002137 sec
      iterations=10000... time=0.0020127 sec
      iterations=100000... time=0.0201074 sec
      iterations=1000000... time=0.201177 sec
      iterations=5000000... time=1.0103 sec
      result: 2.02059 nsec
    Read latency of L2 cache (for 1 PUs) (using 1*983040 bytes):
      iterations=1000... time=0.000586499 sec
      iterations=10000... time=0.00574259 sec
      iterations=100000... time=0.060436 sec
      iterations=1000000... time=0.575375 sec
      iterations=2000000... time=1.14616 sec
      result: 5.73081 nsec
    Read latency of L3 cache (for 2 PUs) (using 1*37748736 bytes):
      [skipped -- avoiding large-memory benchmarks]
    Read bandwidth of D1 cache (for 1 PUs) (using 1*36864 bytes):
      iterations=1... time=1e-06 sec
      iterations=10... time=5.9e-06 sec
      iterations=100... time=5.44e-05 sec
      iterations=1000... time=0.000540699 sec
      iterations=10000... time=0.00541629 sec
      iterations=100000... time=0.0541552 sec
      iterations=1000000... time=0.541873 sec
      iterations=2000000... time=1.08375 sec
      result: 68.0305 GByte/sec
    Read bandwidth of L2 cache (for 1 PUs) (using 1*983040 bytes):
      iterations=1... time=2.5099e-05 sec
      iterations=10... time=0.00022 sec
      iterations=100... time=0.0021282 sec
      iterations=1000... time=0.0213029 sec
      iterations=10000... time=0.212685 sec
      iterations=50000... time=1.06415 sec
      result: 46.1891 GByte/sec
    Read bandwidth of L3 cache (for 2 PUs) (using 1*37748736 bytes):
      [skipped -- avoiding large-memory benchmarks]
    Write latency of D1 cache (for 1 PUs) (using 1*36864 bytes):
      iterations=1000... time=4e-06 sec
      iterations=10000... time=3.28e-05 sec
      iterations=100000... time=0.000321599 sec
      iterations=1000000... time=0.00322139 sec
      iterations=10000000... time=0.0337965 sec
      iterations=100000000... time=0.3227 sec
      iterations=300000000... time=0.966207 sec
      iterations=600000000... time=1.9348 sec
      result: 0.403082 nsec
    Write latency of L2 cache (for 1 PUs) (using 1*983040 bytes):
      iterations=1000... time=1.83e-05 sec
      iterations=10000... time=0.000182499 sec
      iterations=100000... time=0.0018187 sec
      iterations=1000000... time=0.0180861 sec
      iterations=10000000... time=0.180989 sec
      iterations=60000000... time=1.08338 sec
      result: 2.25704 nsec
    Write latency of L3 cache (for 2 PUs) (using 1*37748736 bytes):
      [skipped -- avoiding large-memory benchmarks]
    Write bandwidth of D1 cache (for 1 PUs) (using 1*36864 bytes):
      iterations=1... time=7e-07 sec
      iterations=10... time=2.9e-06 sec
      iterations=100... time=2.46e-05 sec
      iterations=1000... time=0.0002421 sec
      iterations=10000... time=0.00241899 sec
      iterations=100000... time=0.0242606 sec
      iterations=1000000... time=0.242545 sec
      iterations=5000000... time=1.21427 sec
      result: 151.795 GByte/sec
    Write bandwidth of L2 cache (for 1 PUs) (using 1*983040 bytes):
      iterations=1... time=2.81e-05 sec
      iterations=10... time=0.000205499 sec
      iterations=100... time=0.0020268 sec
      iterations=1000... time=0.0202096 sec
      iterations=10000... time=0.202249 sec
      iterations=50000... time=1.01197 sec
      result: 48.5705 GByte/sec
    Write bandwidth of L3 cache (for 2 PUs) (using 1*37748736 bytes):
      [skipped -- avoiding large-memory benchmarks]
    Stencil code performance of D1 cache (for 1 PUs) (using 1*13^3 grid points, 1*35152 bytes):
      iterations=1... time=1.48e-05 sec
      iterations=10... time=0.000145 sec
      iterations=100... time=0.0014098 sec
      iterations=1000... time=0.0140776 sec
      iterations=10000... time=0.139716 sec
      iterations=80000... time=1.10934 sec
      result: 0.158436 Gupdates/sec
    Stencil code performance of L2 cache (for 1 PUs) (using 1*39^3 grid points, 1*949104 bytes):
      iterations=1... time=0.0001081 sec
      iterations=10... time=0.0010683 sec
      iterations=100... time=0.0101828 sec
      iterations=1000... time=0.101658 sec
      iterations=10000... time=1.01952 sec
      result: 0.581832 Gupdates/sec
    Stencil code performance of L3 cache (for 2 PUs) (using 1*133^3 grid points, 1*37642192 bytes):
      iterations=1... time=0.0491767 sec
      iterations=10... time=0.496862 sec
      iterations=20... time=0.989301 sec
      iterations=40... time=1.98256 sec
      result: 0.0474666 Gupdates/sec
  Single-node measurements (using 2 MPI processes with 1 OpenMP threads each):
    CPU frequency:
      iterations=1000000... time=0.00169405 sec
      iterations=10000000... time=0.0169848 sec
      iterations=100000000... time=0.170527 sec
      iterations=600000000... time=1.02122 sec
      iterations=600000000... time=1.04081 sec
      result: -61.2631 GHz
    CPU floating point performance:
      iterations=1000000... time=0.00361484 sec
      iterations=10000000... time=0.0361645 sec
      iterations=100000000... time=0.361301 sec
      iterations=300000000... time=1.10348 sec
      result: 8.69977 Gflop/sec
    CPU integer performance:
      iterations=1000000... time=0.00199615 sec
      iterations=10000000... time=0.0199469 sec
      iterations=100000000... time=0.199441 sec
      iterations=600000000... time=1.19707 sec
      result: 8.0196 Giop/sec
    Read latency of D1 cache (for 1 PUs) (using 2*36864 bytes):
      iterations=1000... time=0.00020235 sec
      iterations=10000... time=0.00200845 sec
      iterations=100000... time=0.0201109 sec
      iterations=1000000... time=0.201363 sec
      iterations=5000000... time=1.00676 sec
      result: 2.01351 nsec
    Read latency of L2 cache (for 1 PUs) (using 2*983040 bytes):
      iterations=1000... time=0.000586749 sec
      iterations=10000... time=0.00586849 sec
      iterations=100000... time=0.0575879 sec
      iterations=1000000... time=0.578667 sec
      iterations=2000000... time=1.15831 sec
      result: 5.79156 nsec
    Read latency of L3 cache (for 2 PUs) (using 2*37748736 bytes):
      [skipped -- too much memory requested]
    Read bandwidth of D1 cache (for 1 PUs) (using 2*36864 bytes):
      iterations=1... time=9e-07 sec
      iterations=10... time=5.8e-06 sec
      iterations=100... time=5.4299e-05 sec
      iterations=1000... time=0.000548599 sec
      iterations=10000... time=0.00541219 sec
      iterations=100000... time=0.0542301 sec
      iterations=1000000... time=0.542883 sec
      iterations=2000000... time=1.08617 sec
      result: 67.8792 GByte/sec
    Read bandwidth of L2 cache (for 1 PUs) (using 2*983040 bytes):
      iterations=1... time=2.51e-05 sec
      iterations=10... time=0.0002211 sec
      iterations=100... time=0.00217555 sec
      iterations=1000... time=0.0217542 sec
      iterations=10000... time=0.218068 sec
      iterations=50000... time=1.08714 sec
      result: 45.212 GByte/sec
    Read bandwidth of L3 cache (for 2 PUs) (using 2*37748736 bytes):
      [skipped -- too much memory requested]
    Write latency of D1 cache (for 1 PUs) (using 2*36864 bytes):
      iterations=1000... time=3.85e-06 sec
      iterations=10000... time=3.27e-05 sec
      iterations=100000... time=0.000321849 sec
      iterations=1000000... time=0.00331414 sec
      iterations=10000000... time=0.0322804 sec
      iterations=100000000... time=0.322509 sec
      iterations=300000000... time=0.970755 sec
      iterations=600000000... time=1.93634 sec
      result: 0.403405 nsec
    Write latency of L2 cache (for 1 PUs) (using 2*983040 bytes):
      iterations=1000... time=1.735e-05 sec
      iterations=10000... time=0.000172099 sec
      iterations=100000... time=0.00178385 sec
      iterations=1000000... time=0.0181191 sec
      iterations=10000000... time=0.180701 sec
      iterations=60000000... time=1.08175 sec
      result: 2.25364 nsec
    Write latency of L3 cache (for 2 PUs) (using 2*37748736 bytes):
      [skipped -- too much memory requested]
    Write bandwidth of D1 cache (for 1 PUs) (using 2*36864 bytes):
      iterations=1... time=6e-07 sec
      iterations=10... time=2.8e-06 sec
      iterations=100... time=2.485e-05 sec
      iterations=1000... time=0.000246299 sec
      iterations=10000... time=0.0024519 sec
      iterations=100000... time=0.0246223 sec
      iterations=1000000... time=0.245767 sec
      iterations=4000000... time=0.985883 sec
      iterations=8000000... time=1.96739 sec
      result: 149.9 GByte/sec
    Write bandwidth of L2 cache (for 1 PUs) (using 2*983040 bytes):
      iterations=1... time=2.905e-05 sec
      iterations=10... time=0.000211549 sec
      iterations=100... time=0.00209205 sec
      iterations=1000... time=0.0209133 sec
      iterations=10000... time=0.209549 sec
      iterations=50000... time=1.03872 sec
      result: 47.32 GByte/sec
    Write bandwidth of L3 cache (for 2 PUs) (using 2*37748736 bytes):
      [skipped -- too much memory requested]
    Stencil code performance of D1 cache (for 1 PUs) (using 2*10^3 grid points, 2*16000 bytes):
      iterations=1... time=1.2e-05 sec
      iterations=10... time=0.0001151 sec
      iterations=100... time=0.00117935 sec
      iterations=1000... time=0.0117228 sec
      iterations=10000... time=0.118081 sec
      iterations=90000... time=1.07744 sec
      result: 0.0835312 Gupdates/sec
    Stencil code performance of L2 cache (for 1 PUs) (using 2*31^3 grid points, 2*476656 bytes):
      iterations=1... time=0.000576749 sec
      iterations=10... time=0.00261444 sec
      iterations=100... time=0.0261206 sec
      iterations=1000... time=0.26167 sec
      iterations=4000... time=1.04947 sec
      result: 0.113547 Gupdates/sec
    Stencil code performance of L3 cache (for 2 PUs) (using 1*133^3 grid points, 1*37642192 bytes):
      [skipped -- too many MPI processes]
  Single-node measurements:
    MPI latency: 1000 nsec
    MPI bandwidth: 5.9142 GByte/sec
INFO (Vectors): Testing vectorisation... [errors may result in segfaults]
INFO (Vectors): 375/375 tests passed 
INFO (CartGrid3D): Grid Spacings:
INFO (CartGrid3D): dx=>1.1111111e-01  dy=>1.1111111e-01  dz=>1.1111111e-01
INFO (CartGrid3D): Computational Coordinates:
INFO (CartGrid3D): x=>[-0.500, 0.500]  y=>[-0.500, 0.500]  z=>[-0.500, 0.500]
INFO (CartGrid3D): Indices of Physical Coordinates:
INFO (CartGrid3D): x=>[0,9]  y=>[0,9]  z=>[0,9]
INFO (PUGH): MPI Evolution on 2 processors
INFO (PUGH): 3-dimensional grid functions
INFO (PUGH):   Size: 10 10 10
INFO (PUGH):   Processor topology: 2 x 1 x 1
INFO (PUGH):   Local load: 600   [6 x 10 x 10]
INFO (PUGH):   Maximum load skew: 0.000000
--------------------------------------------------------------------------------
Done.
+ echo Stopping:
Stopping:
+ date
Wed Dec 22 20:14:33 UTC 2021
+ echo Done.
Done.
  Elapsed time: 48.4 s
