Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about MPI_Finalize #535

Open
ollielo opened this issue Feb 13, 2024 · 4 comments
Open

Question about MPI_Finalize #535

ollielo opened this issue Feb 13, 2024 · 4 comments

Comments

@ollielo
Copy link

ollielo commented Feb 13, 2024

We are using the Annotation interface in our code. We used to (in pre-2.10) be able to rely on the destroctor or some other Caliper internal mechanism to call MPI_Finalize before Caliper is shut down. However, in 2.10, we got something like

== CALIPER: default: mpireport: MPI is already finalized. Cannot aggregate output

Do we now have to call MPI_Finalize ourself?

@daboehme
Copy link
Member

Hi @ollielo, thanks for the report. Caliper should still trigger a flush on MPI_Finalize() itself.

What's your runtime configuration? It looks like you're manually configuring Caliper with CALI_SERVICES_ENABLE etc. In that case make sure you have the mpi service activated, e.g. CALI_SERVICES_ENABLE=aggregate,event,mpi,mpireport,timer. You can also try one of Caliper's built-in configuration recipes, e.g. CALI_CONFIG=runtime-report.

@ollielo
Copy link
Author

ollielo commented Feb 13, 2024

Thanks for your answer. I was kind of getting it reversed. Previously, we did not need to explicit shutdown Caliper and it seemed like Caliper will intercept our call to MPI_Finalize and shut itself down properly. Now we need to add an explicit call to

cali::Caliper::instance().finalize();

before we calling MPI_Finalize. What I want to understand are:

  1. Does/how does Caliper intercept call to MPI_Finalize?
  2. Is calling cali::Caliper::instance().finalize() a properly to manually shutdown Caliper.

@daboehme
Copy link
Member

Caliper should still intercept the MPI_Finalize() call. If you're configuring Caliper with CALI_SERVICES_ENABLE=... you'll need to add the mpi service for this to work. If you're using one of the built-in recipes (CALI_CONFIG=...) it should do that automatically.

By default we're using the GOTCHA library which comes with Caliper to intercept MPI calls. It explicitly intercepts the C API MPI_Finalize() call, so it can fail if you're using the C++ or Fortran MPI API. However you mentioned that it used to work in v2.9, which is curious - I don't think there were any changes between v2.9 and v2.10 in the way Caliper intercepts MPI_Finalize(), but we did update the GOTCHA library. You could try and run with CALI_LOG_VERBOSITY=1 GOTCHA_DEBUG=2 set as environment variables, that should give us some debug output and tell us if we at least attempt to intercept MPI_Finalize.

Finally, calling cali::Caliper::instance().finalize() explicitly is a bit of a hacky workaround but it should do the trick for now. But again, it should not be necessary to do this and I'm curious what's going wrong.

@ollielo
Copy link
Author

ollielo commented Mar 4, 2024

I tried what you suggested. Here is the relevant part of the log

[1721697/1721697][gotcha.c:150] - gotcha_rewrite_wrapper_orders for binding MPI_Init in tool caliper/mpi of priority -1
[1721697/1721697][gotcha.c:156] - Adding new entry for MPI_Init to hash table
[1721697/1721697][gotcha.c:324] - Symbol MPI_Init needs lookup operation
[1721697/1721697][gotcha.c:98] - Looking up exported symbols for MPI_Init
[1721697/1721697][gotcha.c:87] - Symbol MPI_Init found in /home/ollie/opt/spack/opt/spack/linux-fedora39-skylake_avx512/gcc-12.3.0/openmpi-4.1.6-fitrx5rjvdbidrx6xdkt5hu3s2dv7cij/lib/libmpi.so.40 at 0x7f02e8eb1b70
[1721697/1721697][gotcha.c:334] - Symbol MPI_Init needs binding from application
[1721697/1721697][gotcha.c:150] - gotcha_rewrite_wrapper_orders for binding MPI_Init_thread in tool caliper/mpi of priority -1
[1721697/1721697][gotcha.c:156] - Adding new entry for MPI_Init_thread to hash table
[1721697/1721697][gotcha.c:324] - Symbol MPI_Init_thread needs lookup operation
[1721697/1721697][gotcha.c:98] - Looking up exported symbols for MPI_Init_thread
[1721697/1721697][gotcha.c:87] - Symbol MPI_Init_thread found in /home/ollie/opt/spack/opt/spack/linux-fedora39-skylake_avx512/gcc-12.3.0/openmpi-4.1.6-fitrx5rjvdbidrx6xdkt5hu3s2dv7cij/lib/libmpi.so.40 at 0x7f02e8eb1cc0
[1721697/1721697][gotcha.c:334] - Symbol MPI_Init_thread needs binding from application
[1721697/1721697][gotcha.c:150] - gotcha_rewrite_wrapper_orders for binding MPI_Finalize in tool caliper/mpi of priority -1
[1721697/1721697][gotcha.c:156] - Adding new entry for MPI_Finalize to hash table
[1721697/1721697][gotcha.c:324] - Symbol MPI_Finalize needs lookup operation
[1721697/1721697][gotcha.c:98] - Looking up exported symbols for MPI_Finalize
[1721697/1721697][gotcha.c:87] - Symbol MPI_Finalize found in /home/ollie/opt/spack/opt/spack/linux-fedora39-skylake_avx512/gcc-12.3.0/openmpi-4.1.6-fitrx5rjvdbidrx6xdkt5hu3s2dv7cij/lib/libmpi.so.40 at 0x7f02e8eaba90
[1721697/1721697][gotcha.c:334] - Symbol MPI_Finalize needs binding from application
== CALIPER: default: Registered MPI service
== CALIPER: default: mpireport: MPI is already finalized. Cannot aggregate output.

It looks to me that both MPI_Init and MPI_Finalize are intercepted by GOTCHA. Any other suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants