Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure in running .test_sw4.py at Test 9 #211

Open
batkillerz opened this issue Apr 19, 2024 · 0 comments
Open

failure in running .test_sw4.py at Test 9 #211

batkillerz opened this issue Apr 19, 2024 · 0 comments

Comments

@batkillerz
Copy link

batkillerz commented Apr 19, 2024

Hi

I managed to install sw4 using make successfully
'-.,_,.-''-.,,.=''-.,_,.-''-.,,.='````'-.,_,.-''-.,_,.='


/ ____ \ \ \ / \ / / / | | |
| | ./ \ \ / \ / / / | | |
| |______ \ / / / / '--' |
______ \ \ / |______ |
| | \ /\ / | |
/`_| | \ / \ / | |
_
___/ _/ _/ ||


| | | | \ \ / / | | / | | |
| | | | \ / / | | | ( | |
| | | | \ / | | _ | | |
| `----.| | \ / | |
_ ) | ||
|||__| _/ |_| (___/ ()

'-.,_,.-''-.,,.=''-.,_,.-''-.,,.='````'-.,_,.-''-.,_,.='

But when I try to run ./test_sw4.py -u 0 -d debug_mp/ -v, it fails at state 10 as follows

Test # 9 Input file: tw-att-2.in PASSED
Starting test # 10 in directory: attenuation with input file: tw-topo-att-1.in
Running sw4 from directory: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/attenuation
run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in']
ERROR: Test tw-topo-att-1.in : sw4 returned non-zero exit status= 1 aborting test
run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in']
DID YOU USE THE CORRECT SW4 EXECUTABLE? (SPECIFY DIRECTORY WITH -d OPTION)
test_sw4 was unsuccessful

when I try to run the last command,"mpirun -np 4 /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4 /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in", the following error came out

        sw4 version 3.0

This program comes with ABSOLUTELY NO WARRANTY; released under GPL.
This is free software, and you are welcome to redistribute
it under certain conditions, see LICENSE.txt for more details

Compiled on: Fri Apr 19 04:49:41 PM +08 2024
By user: batkillerz
Machine: homelab
Compiler: /storage/software/openmpi/5.0.1/bin/mpicxx
3rd party include dir: /include, and library dir: /lib

Input file: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in
Default Supergrid thickness has been tuned; # grid points = 1 grid sizes
Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00

  • Processing the grid command...
  • Setting h to 1.25600000e-01 from x/(nx-1) (x=6.28000000e+00, nx=51)
  • Setting ny to 51 to be consistent with h=1.25600000e-01
  • Setting nz to 51 to be consistent with h=1.25600000e-01
    cleanupRefinementLevels: topo_zmax = 3.00000000e+00
    Cartesian refinement levels (z=):
    3.00000000e+00
    Curvilinear refinement levels (z=):
    0.00000000e+00
    Grid distributed on 4 processors
    Finest grid size 55 x 55
    Processor array 2 x 2
    Number of curvilinear grids = 1
    Number of Cartesian grids = 1
    Total number of grids = 2
    Extent of the computational domain xmax=6.28000000e+00 ymax=6.28000000e+00 zmax=6.26560000e+00
    Cartesian refinement levels after correction:
    Grid=0 z-min=3.00000000e+00
    Corrected global_zmax = 6.26560000e+00

Rank=0, Grid #1 (curvilinear), iInterior=[1,26], jInterior=[1,26]
Rank=0, Grid #0 (Cartesian), iInterior=[1,26], jInterior=[1,26], kInterior=[1,27]
inside allocateCurvilinearArrays

***Topography grid: min z = -5.967331e-01, max z = -2.932192e-58, top Cartesian z = 3.000000e+00

Global grid sizes (without ghost points)
Grid h Nx Ny Nz Points Type
0 0.1256 51 51 27 70227 Cartesian
1 0.1256 51 51 27 70227 Curvilinear
Total number of grid points (without ghost points): 140454

Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00
Default Supergrid thickness has been tuned; # grid points = 1 grid sizes

Execution time, reading input file 6.77432830e-02 seconds
Assuming a SERIAL file system.
Detected at least one boundary with supergrid conditions

Making Directory: tw-topo-att-1/

... Done!

Geographic and Cartesian coordinates of the corners of the computational grid:
0: Lon= -1.180000e+02, Lat=3.700000e+01, x=0.000000e+00, y=0.000000e+00
1: Lon= -1.180000e+02, Lat=3.700006e+01, x=6.280000e+00, y=0.000000e+00
2: Lon= -1.179999e+02, Lat=3.700006e+01, x=6.280000e+00, y=6.280000e+00
3: Lon= -1.179999e+02, Lat=3.700000e+01, x=0.000000e+00, y=6.280000e+00


ASSIGNING TWILIGHT MATERIALS


   ----------- Material properties ranges ---------------
   1.00118341e+00 kg/m^3 <=  Density <= 2.99885859e+00 kg/m^3
   1.63353903e+00 m/s    <=  Vp      <= 2.82632270e+00 m/s
   1.00033876e+00 m/s    <=  Vs      <= 1.73075388e+00 m/s
   1.52767088e+00        <=  Vp/Vs   <= 1.73199227e+00
   2.00118341e+00 Pa     <=  mu      <= 3.99885859e+00 Pa
   1.00157479e+00 Pa     <=  lambda  <= 2.99848185e+00 Pa
   ------------------------------------------------------

***** PPW = minVs/h/maxFrequency ********
g=0, h=1.256000e-01, minVs/h=7.96448 (Cartesian)
g=1, h=1.256000e-01, minVs/h=7.96466 (curvilinear)

*** Attenuation parameters calculated for 1 mechanisms,
max freq=2.000000e+00 [Hz], min_freq=2.000000e-02 [Hz], velo_freq=1.000000e+00 [Hz]

Assigned material properties
*** computing the time step ***
[homelab:576512] *** Process received signal ***
[homelab:576512] Signal: Segmentation fault (11)
[homelab:576512] Signal code: Address not mapped (1)
[homelab:576512] Failing at address: 0x7ffc1189c000
[homelab:576511] *** Process received signal ***
[homelab:576511] Signal: Segmentation fault (11)
[homelab:576511] Signal code: Address not mapped (1)
[homelab:576511] Failing at address: 0x7ffedd464000
***Message from routine DSPEV in library SLATEC.
***Potentially recoverable error, Prog aborted, Traceback requested

  • On entry to DSPEV parameter number 3 had an illegal value.
  • Error number = 3

***End of message

***Job abort due to unrecovered error.

***Message from routine DSPEV in library SLATEC.
***Potentially recoverable error, Prog aborted, Traceback requested

  • On entry to DSPEV parameter number 3 had an illegal value.
    ***Message from routine DSPEV in library SLATEC.
    ***Potentially recoverable error, Prog aborted, Traceback requested
  • On entry to DSPEV parameter number 3 had an illegal value.
  • Error number = 3

***End of message

***Message from routine DSPEV in library SLATEC.
***Potentially recoverable error, Prog aborted, Traceback requested

  • On entry to DSPEV parameter number 3 had an illegal value.
  • Error number = 3

***End of message

***Job abort due to unrecovered error.

      Error message summary

Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1

***Message from routine DSPEV in library SLATEC.
***Potentially recoverable error, Prog aborted, Traceback requested

  • On entry to DSPEV parameter number 3 had an illegal value.
  • Error number = 3

***End of message

***Job abort due to unrecovered error.

      Error message summary

Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1

      Error message summary

Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1

***Job abort due to unrecovered error.

  • Error number = 3

***End of message

      Error message summary

Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1

***Job abort due to unrecovered error.

      Error message summary

Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1
[homelab:576512] [ 0] /lib64/libc.so.6(+0x54db0)[0x7f108f454db0]
[homelab:576512] [ 1] /lib64/liblapack.so.3(dlansp_+0x2d5)[0x7f10905c1ab5]
[homelab:576512] [ 2] [homelab:576511] [ 0] /lib64/libc.so.6(+0x54db0)[0x7f99d5454db0]
[homelab:576511] [ 1] /lib64/liblapack.so.3(dlansp_+0x2d5)[0x7f99d65c1ab5]
[homelab:576511] [ 2] /lib64/liblapack.so.3(dspev_+0x15b)[0x7f99d661140b]
[homelab:576511] [ 3] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4f627c]
[homelab:576511] [ 4] /lib64/libgomp.so.1(GOMP_parallel+0x46)[0x7f99d62f2576]
[homelab:576511] [ 5] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4ee103]
[homelab:576511] [ 6] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4e89d6]
[homelab:576511] [ 7] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x407532]
/lib64/liblapack.so.3(dspev_+0x15b)[0x7f109061140b]
[homelab:576512] [ 3] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4f627c]
[homelab:576512] [ 4] /lib64/libgomp.so.1(GOMP_parallel+0x46)[0x7f1090b72576]
[homelab:576512] [ 5] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4ee103]
[homelab:576512] [ 6] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4e89d6]
[homelab:576512] [ 7] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x407532]

[homelab:576512] [homelab:576511] [ 8] /lib64/libc.so.6(+0x3feb0)[0x7f99d543feb0]

prterun has exited due to process rank 3 with PID 576513 on node homelab exiting
improperly. There are three reasons this could occur:

  1. this process did not call "init" before exiting, but others in the
    job did. This can cause a job to hang indefinitely while it waits for
    all processes to call "init". By rule, if one process calls "init",
    then ALL processes must call "init" prior to termination.

  2. this process called "init", but exited without calling "finalize".
    By rule, all processes that call "init" MUST call "finalize" prior to
    exiting or it will be considered an "abnormal termination"

  3. this process called "MPI_Abort" or "prte_abort" and the mca
    parameter prte_create_session_dirs is set to false. In this case, the
    run-time cannot detect that the abort call was an abnormal
    termination. Hence, the only error message you will receive is this
    one.

This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).

You can avoid this message by specifying -quiet on the prterun command
line.

Any idea how I can fix this?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant