New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommendations for Faster Simulation #1667
Comments
that's indeed quite a slow performance. I have no access to such an AMD chip, but comparing this to a similar setup on my Apple M1 Pro chip, the mean elapsed time per time step should be about 10x-20x faster. something I notice is that your bandwidth test shows a very slow bandwidth:
compared to the bandwidth on the M1 chip, which is around
this bandwidth with the force_vectorization turned on (you can compile with the flag the main simulation features that affect performance is the number of elastic elements per process, i.e., 48884 in your case:
and having attenuation turned on or not, as well as having PML turned on or not. compared with a similar simulation setup on my system, the The system specifics depend on compiler, chip, node memory and MPI fabrics. In you case, the following could be checked:
|
Hi @danielpeter, Thank you for your help with this I have tried a few things (details below), but I have not been able to achieve the very fast bandwidth numbers you were able to achieve. I am hoping you may have some additional suggestions. Attempt 1: Reduce cores to 64 and use --enable-vectorizationI recompiled with the
Attempt 2: Using IntelMPI and Intel Xeon Platinum 9242You recommended using a different MPI library so rather than using OpenMPI (as above) I switched to using IntelMPI. I also chose to switch nodes type from the AMD EPYC 7702 (above) to Intel Xeon Platinum 9242:
Thanks again for your help! |
well, there is probably not much else to do other than trying to run this simulation on a bigger system and across multiple nodes, where you can parallelize the simulation with a higher number of available processor cores, or get access to a GPU system. just to follow up your outputs with some more suggestions:
and, don't try to match the high bandwidth numbers of the Apple M1 Pro chip with these AMD and Intel chips. the M1 has an exceptionally high bandwidth as compared to other CPUs. better to find a larger system to run your simulation - or try out google colab to see if it fit's onto a free TPU/GPU. this example in the package might help you with setting this up: |
Dear SPECFEM3D Team,
I am new to SPECFEM3D but I have used SPECFEM2D quite extensively. I am working on simulating a 30 Hz Ricker point source at the surface of a 60 m by 60 m by 25 m domain. The simulation is taking about 30 hours to complete using 121 cores of a AMD EPYC 7702.
I have two questions.
Thank you in advance for your help and building such an excellent resource.
In case they are helpful, I have included the
output_meshfem3D.txt
andoutput_solver.txt
from the run below.output_meshfem3D.txt
output_solver.txt
The text was updated successfully, but these errors were encountered: