Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed job while using mpi parallel #1170

Open
rashi-13 opened this issue Dec 6, 2022 · 6 comments
Open

Failed job while using mpi parallel #1170

rashi-13 opened this issue Dec 6, 2022 · 6 comments

Comments

@rashi-13
Copy link

rashi-13 commented Dec 6, 2022

When changing the "NUMBER_OF_SIMULTANEOUS_RUNS = 3" (in the Par_file) and running using mpi with 3 processes, the job was failing with an error message:
"must not have IMAIN == ISTANDARD_OUTPUT when NUMBER_OF_SIMULTANEOUS_RUNS > 1 otherwise output to screen is mingled. Change this in specfem/setup/constant.h.in and recompile.
Error detected, aborting MPI... proc 0"

This was performed using an HPC with 3 nodes.
I want to generate the results for 39 source positions simultaneously. Kindly help me figure this out.

@homnath
Copy link

homnath commented Dec 6, 2022 via email

@rashi-13
Copy link
Author

rashi-13 commented Dec 6, 2022

Hello Sir

Thank you very much for responding quickly.
I tried running the solver after making the necessary changes suggested by you. I got the following error :
"configure: error: MPI header not found; try setting MPI_INC."

Also, i want to understand what changes do i need to make and in which file in order to run the same model for different source locations (parallely as in using mpi).

It would be great if you could help me out.

Thankyou very much.

@homnath
Copy link

homnath commented Dec 6, 2022 via email

@rashi-13
Copy link
Author

rashi-13 commented Dec 7, 2022

Hello Sir

I have updated the path now.
After setting the variable "NUMBER_OF_SIMULTANEOUS_RUNS = 2" and "NPROC =4" since NPROC is required to be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS in the "Par_file"

I am getting the following error"

Error: the number of MPI processes 1 is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS = 2
the number of MPI processes is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS. Make sure you call meshfem2D with mpirun.
Error detected, aborting MPI... proc 0

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 30.


I will be really grateful if you could help me figure this out.

Thank you very much.

@homnath
Copy link

homnath commented Dec 8, 2022 via email

@AbolfazlKhanMo
Copy link

AbolfazlKhanMo commented Dec 12, 2023

@homnath ,

I followed what you have said so far. I am running two simulations (NUMBER_OF_SIMULTANEOUS_RUNS = 2 & NPROC =4). I have also created run0001 and run0002 directories, both containing DATA and OUTPUT_FILES directories.

Then this is what I have got when I ran mpirun -n 8 ./bin/xmeshfem2D :

NUMBER_OF_SIMULTANEOUS_RUNS not compatible yet with SAVE_MODEL. Look for SMNSR in the source code.

I looked at the /specfem2d/src/specfem2D/save_model_files.f90 code (is this the correct file to look at?). On line 125, it says:
! SMNSR For compatibility with NUMBER_OF_SIMULTANEOUS_RUNS we have to change the lines trim(IN_DATA_FILES)//'proc'

There are a bunch of lines with the mentioned statement in the comment above, but could you please give me some hint/info how to proceed and potentially run my simulations simultaneously?

Thanks very much,
Khan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants