Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P4est MPI simulation gets stuck when not all ranks have BC faces #1878

Open
benegee opened this issue Mar 19, 2024 · 4 comments · May be fixed by #1879
Open

P4est MPI simulation gets stuck when not all ranks have BC faces #1878

benegee opened this issue Mar 19, 2024 · 4 comments · May be fixed by #1879
Labels
bug Something isn't working parallelization Related to MPI, threading, tasks etc.

Comments

@benegee
Copy link
Contributor

benegee commented Mar 19, 2024

I first encountered this issue for the baroclinic instability test case. Then I found a reduced example based on elixir_advection_basic but with nonperiodic boundary conditions:

  • 3 cells
  • 3 MPI ranks
  • periodic boundary conditions in x-direction
  • Dirichlet boundary conditions in y-direction
  • important is that the rank in the middle does not have any boundaries with Dirichlet boundary condition

Screenshot_20240319_120453

Running this with system MPI and on 3 ranks makes the simulation hang. Running with tmpi and looking at the backtraces when aborting shows that two ranks have called init_boundaries and eventually p4est_iterate, while the third (the middle one) is already somewhere in rhs!.

It seems to be caused by the check

if n_boundaries > 0
init_boundaries!(boundaries, mesh)
end

Consequently only two ranks eventually call p4est_iterate. On the p4est side p4est_iterate first calls p4est_is_valid, which calls MPI_Allreduce.

This would explain the blocking. What I do not understand is why it works when not using system MPI.

MWE
using OrdinaryDiffEq
using Trixi

###############################################################################
# semidiscretization of the linear advection-diffusion equation

advection_velocity = (1.0, 0.0)
equations = LinearScalarAdvectionEquation2D(advection_velocity)

initial_condition = initial_condition_gauss

boundary_conditions = Dict(:y_neg => BoundaryConditionDirichlet(initial_condition),
                           :y_pos => BoundaryConditionDirichlet(initial_condition))

# Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux
solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs)

coordinates_min = (-5.0, -5.0)
coordinates_max = (5.0, 5.0)

trees_per_dimension = (1, 3)

mesh = P4estMesh(trees_per_dimension, polydeg = 3,
coordinates_min = coordinates_min, coordinates_max = coordinates_max, initial_refinement_level = 0,
periodicity = (true, false))

# A semidiscretization collects data structures and functions for the spatial discretization
semi = SemidiscretizationHyperbolic(mesh, equations,
                                    initial_condition, solver,
                                    boundary_conditions = boundary_conditions)

###############################################################################
# ODE solvers, callbacks etc.

# Create ODE problem with time span `tspan`
tspan = (0.0, 1.0)
ode = semidiscretize(semi, tspan);

# At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup
# and resets the timers
summary_callback = SummaryCallback()

callbacks = CallbackSet(summary_callback)

###############################################################################
# run the simulation

# OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks
sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false),
            dt = 0.01,
            save_everystep = false, callback = callbacks);

# Print the timer summary
summary_callback()
@benegee benegee added bug Something isn't working parallelization Related to MPI, threading, tasks etc. labels Mar 19, 2024
@sloede
Copy link
Member

sloede commented Mar 21, 2024

This would explain the blocking. What I do not understand is why it works when not using system MPI.

I share your conclusion. It seems to be a good explanation for the blocking, but does not explain why it works with the out-of-the-box MPI. Maybe you can verify (using a "wololo" statement or just plain output to stderr) that indeed the init_boundaries! function is not called on the middle rank for the Julia-provided MPI but gets called by the system MPI?

@benegee
Copy link
Contributor Author

benegee commented Mar 22, 2024

I double-checked this: init_boundaries! is not called on the middle rank, neither for julia-provided MPI nor for system MPI. For both variants there are 26 calls of p4est_iterate in total whereas the number should be a multiple of 3. However the simulation just continues with julia-provided MPI but hangs with system MPI.

@sloede
Copy link
Member

sloede commented Mar 23, 2024

This becomes more and more baffling... Alas, the joys of MPI-based parallelism and debugging 😱

What happens if you put an MPI_Barrier after the if n_boundaries > 0 ... end check, then print something from all ranks, then again an MPI_Barrier. Will all ranks reach that statement?

I do not see how a simulation might continue if a collective communication is not issued from all ranks, unless it is mixed with later calls to collective calls that somehow "match" with one MPI implementation and do not match with others.

Incidentally, are system MPI and Julia MPI the same MPI implementations? If not, can you check what happens if you use the same? That is, maybe it's not a Julia thing but really an MPI thing...

@benegee
Copy link
Contributor Author

benegee commented Mar 25, 2024

Thanks for the suggestions! Here is a small update:

When I put MPI_Barriers after if n_boundaries > 0 ... end they are passed by all ranks. I see the corresponding printlns of all ranks and the program does not hang.

When I put the barriers within if n_boundaries > 0 ... end only two ranks show the corresponding messages and the program does hang, also with julia MPI.

I also tried to change the MPI implementation. MPItrampoline_jll showed the same behavior. Unfortunately I could not make OpenMPI_jll work so far. It led to errors while precompiling P4est.jll and T8code.jll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parallelization Related to MPI, threading, tasks etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants