Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart with free surface and latest deal.II is failing #5607

Open
gassmoeller opened this issue Mar 18, 2024 · 7 comments
Open

Restart with free surface and latest deal.II is failing #5607

gassmoeller opened this issue Mar 18, 2024 · 7 comments

Comments

@gassmoeller
Copy link
Member

It looks like there was a change in deal.II master that broke our restart feature if the free surface is active (see e.g. #5605 which fails the test checkpoint_07_enable_free_surface_resume with deal.II master even though it only changed documentation, deal.II 9.5 works fine, other affected PRs are #5606, #5604, and #5603). The test fails in the Stokes solver after the restart with:

-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.6.0-pre&sha=&src=code
-----------------------------------------------------------------------------
*** Resuming from snapshot!

Number of active cells: 768 (on 4 levels)
Number of degrees of freedom: 10,656 (6,528+864+3,264)

Number of mesh deformation degrees of freedom: 1,728
   Solving mesh displacement system... 0 iterations.
*** Timestep 8:  t=3e+07 years, dt=1.73037e+06 years
   Solving mesh displacement system... 1 iterations.
   Solving temperature system... 6 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 200+[1ee0935b8cb1:29370] *** Process received signal ***
[1ee0935b8cb1:29370] Signal: Floating point exception (8)
[1ee0935b8cb1:29370] Signal code: Invalid floating point operation (7)
[1ee0935b8cb1:29370] Failing at address: 0x714b58596074
[1ee0935b8cb1:29370] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x714b58740420]
[1ee0935b8cb1:29370] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x5c074)[0x714b58596074]
[1ee0935b8cb1:29370] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x75548)[0x714b585af548]
[1ee0935b8cb1:29370] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x76dcd)[0x714b585b0dcd]
[1ee0935b8cb1:29370] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x8bf9a)[0x714b585c5f9a]
[1ee0935b8cb1:29370] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xf4c70)[0x714b589adc70]
[1ee0935b8cb1:29370] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0x106)[0x714b589df276]
[1ee0935b8cb1:29370] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x93)[0x714b589ed9a3]
[1ee0935b8cb1:29370] [ 8] /__w/aspect/aspect/build/aspect(_ZN6aspect9Utilities37throw_linear_solver_failure_exceptionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_RKSt6vectorIN6dealii13SolverControlESaISB_EERKSt9exceptionP19ompi_communicator_tS8_+0x189)[0x582e1ae6fea9]
[1ee0935b8cb1:29370] [ 9] /__w/aspect/aspect/build/aspect(_ZNK6aspect8internal24BlockSchurPreconditionerIN6dealii16TrilinosWrappers15PreconditionAMGENS3_16PreconditionBaseEE5vmultERNS3_3MPI11BlockVectorERKS8_+0x785)[0x582e1b36c855]
[1ee0935b8cb1:29370] [10] /__w/aspect/aspect/build/aspect(_ZN6dealii12SolverFGMRESINS_16TrilinosWrappers3MPI11BlockVectorEE5solveIN6aspect8internal11StokesBlockENS7_24BlockSchurPreconditionerINS1_15PreconditionAMGENS1_16PreconditionBaseEEEEEvRKT_RS3_RKS3_RKT0_+0x537)[0x582e1b36d077]
[1ee0935b8cb1:29370] [11] /__w/aspect/aspect/build/aspect(_ZN6aspect9SimulatorILi2EE12solve_stokesEv+0x2272)[0x582e1b371e22]
[1ee0935b8cb1:29370] [12] /__w/aspect/aspect/build/aspect(_ZN6aspect9SimulatorILi2EE25assemble_and_solve_stokesERKdPd+0xa4)[0x582e1af9c3d4]
[1ee0935b8cb1:29370] [13] /__w/aspect/aspect/build/aspect(_ZN6aspect9SimulatorILi2EE36solve_single_advection_single_stokesEv+0x91)[0x582e1b186221]
[1ee0935b8cb1:29370] [14] /__w/aspect/aspect/build/aspect(_ZN6aspect9SimulatorILi2EE14solve_timestepEv+0x125)[0x582e1ae1ae65]
[1ee0935b8cb1:29370] [15] /__w/aspect/aspect/build/aspect(_ZN6aspect9SimulatorILi2EE3runEv+0x34f)[0x582e1ae2ed6f]
[1ee0935b8cb1:29370] [16] /__w/aspect/aspect/build/aspect(_Z13run_simulatorILi2EEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_bbb+0x22b)[0x582e1bc5b09b]
[1ee0935b8cb1:29370] [17] /__w/aspect/aspect/build/aspect(main+0x5aa)[0x582e1ad79d5a]
[1ee0935b8cb1:29370] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x714b5855e083]
[1ee0935b8cb1:29370] [19] /__w/aspect/aspect/build/aspect(_start+0x2e)[0x582e1ad833ce]
[1ee0935b8cb1:29370] *** End of error message ***
ninja: build stopped: subcommand failed.

CMake Error at /__w/aspect/aspect/tests/run_test.cmake:14 (MESSAGE):
  *** Test aborted.

I havent been able to pin down the problematic change in deal.II yet, I just wanted to let everyone know that this is likely not a fault of any ASPECT PR (since deal.II 9.5 works fine). There have been a few changes in the FGMRES solver in deal.II lately, I suspect it could be something in there.

@gassmoeller
Copy link
Member Author

Just as a note to myself, this looks a bit like the problem we are observing for the GMG preconditioner that the solver crashes upon switching from cheap to expensive iterations. This test uses the AMG preconditioner however. GMG is using the GMRES solver, while AMG is using FGMRES, so maybe one of the recent changes in deal.II has transported the problem that was so far only in GMRES into the FGMRES code?

For reference, the error message on my system looks different, instead of throwing an FPE immediately, I get the following:

157: *** Resuming from snapshot!
157: 
157: Number of active cells: 768 (on 4 levels)
157: Number of degrees of freedom: 10,656 (6,528+864+3,264)
157: 
157: Number of mesh deformation degrees of freedom: 1,728
157:    Solving mesh displacement system... 0 iterations.
157: *** Timestep 8:  t=3e+07 years, dt=1.73037e+06 years
157:    Solving mesh displacement system... 1 iterations.
157:    Solving temperature system... 6 iterations.
157:    Rebuilding Stokes preconditioner...
157:    Solving Stokes system... 200+---------------------------------------------------------
157: TimerOutput objects finalize timed values printed to the
157: screen by communicating over MPI in their destructors.
157: Since an exception is currently uncaught, this
157: synchronization (and subsequent output) will be skipped
157: to avoid a possible deadlock.
157: ---------------------------------------------------------
157: 
157: 
157: ----------------------------------------------------
157: Exception 'ExcMessage (exception_message.str())' on rank 0 on processing: 
157: 
157: --------------------------------------------------------
157: An error occurred in line <2978> of file </home/rene/software/aspect/source/utilities.cc> in function
157:     void aspect::Utilities::throw_linear_solver_failure_exception(const string&, const string&, const std::vector<dealii::SolverControl>&, const std::exception&, MPI_Comm, const string&)
157: Additional information: 
157:     The iterative Stokes solver in Simulator::solve_stokes did not
157:     converge.
157:     
157:     The initial residual was: 3.067309e+12
157:     The final residual is: 5.853176e+10
157:     The required residual for convergence is: 2.021590e+06
157:     See output-checkpoint_07_enable_free_surface_resume/solver_history.txt
157:     for the full convergence history.
157:     
157:     The solver reported the following error:
157:     
157:     --------------------------------------------------------
157:     An error occurred in line <2978> of file
157:     </home/rene/software/aspect/source/utilities.cc> in function
157:     void aspect::Utilities::throw_linear_solver_failure_exception(const
157:     string&, const string&, const std::vector<dealii::SolverControl>&,
157:     const std::exception&, MPI_Comm, const string&)
157:     Additional information:
157:     The iterative (top left) solver in BlockSchurPreconditioner::vmult did
157:     not converge.
157:     
157:     The initial residual was: nan
157:     The final residual is: 1.323246e-02
157:     The required residual for convergence is: 2.189823e-04
157:     
157:     The solver reported the following error:
157:     
157:     --------------------------------------------------------
157:     An error occurred in line <557> of file
157:     </home/rene/software/dealii/source/lac/trilinos_solver.cc> in function
157:     void dealii::TrilinosWrappers::SolverBase::do_solve(const
157:     Preconditioner&) [with Preconditioner =
157:     dealii::TrilinosWrappers::PreconditionBase]
157:     Additional information:
157:     Iterative method reported convergence failure in step 10000. The
157:     residual in the last step was 0.0132325.
157:     
157:     This error message can indicate that you have simply not allowed a
157:     sufficiently large number of iterations for your iterative solver to
157:     converge. This often happens when you increase the size of your
157:     problem. In such cases, the last residual will likely still be very
157:     small, and you can make the error go away by increasing the allowed
157:     number of iterations when setting up the SolverControl object that
157:     determines the maximal number of iterations you allow.
157:     
157:     The other situation where this error may occur is when your matrix is
157:     not invertible (e.g., your matrix has a null-space), or if you try to
157:     apply the wrong solver to a matrix (e.g., using CG for a matrix that
157:     is not symmetric or not positive definite). In these cases, the
157:     residual in the last iteration is likely going to be large.
157:     
157:     Stacktrace:
157:     -----------
157:     #0  /home/rene/software/dealii/build/lib/libdeal_II.so.9.6.0-pre: void
157:     dealii::TrilinosWrappers::SolverBase::do_solve<dealii::TrilinosWrappers::PreconditionBase>(dealii::TrilinosWrappers::PreconditionBase
157: 
157:     const&)
157:     #1  /home/rene/software/aspect/build/aspect:
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157: 
157:     dealii::TrilinosWrappers::PreconditionBase>::vmult(dealii::TrilinosWrappers::MPI::BlockVector&,
157: 
157:     dealii::TrilinosWrappers::MPI::BlockVector const&) const
157:     #2  /home/rene/software/aspect/build/aspect: void
157:     dealii::SolverFGMRES<dealii::TrilinosWrappers::MPI::BlockVector>::solve<aspect::internal::StokesBlock,
157: 
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157: 
157:     dealii::TrilinosWrappers::PreconditionBase>
157:     >(aspect::internal::StokesBlock const&,
157:     dealii::TrilinosWrappers::MPI::BlockVector&,
157:     dealii::TrilinosWrappers::MPI::BlockVector const&,
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157: 
157:     dealii::TrilinosWrappers::PreconditionBase> const&)
157:     #3  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_stokes()
157:     #4  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::assemble_and_solve_stokes(double const&,
157:     double*)
157:     #5  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_single_advection_single_stokes()
157:     #6  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_timestep()
157:     #7  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::run()
157:     #8  /home/rene/software/aspect/build/aspect: void
157:     run_simulator<2>(std::__cxx11::basic_string<char,
157:     std::char_traits<char>, std::allocator<char> > const&,
157:     std::__cxx11::basic_string<char, std::char_traits<char>,
157:     std::allocator<char> > const&, bool, bool, bool)
157:     #9  /home/rene/software/aspect/build/aspect: main
157:     --------------------------------------------------------
157:     
157:     
157:     Stacktrace:
157:     -----------
157:     #0  /home/rene/software/aspect/build/aspect:
157:     aspect::Utilities::throw_linear_solver_failure_exception(std::__cxx11::basic_string<char,
157:     std::char_traits<char>, std::allocator<char> > const&,
157:     std::__cxx11::basic_string<char, std::char_traits<char>,
157:     std::allocator<char> > const&, std::vector<dealii::SolverControl,
157:     std::allocator<dealii::SolverControl> > const&, std::exception const&,
157:     ompi_communicator_t*, std::__cxx11::basic_string<char,
157:     std::char_traits<char>, std::allocator<char> > const&)
157:     #1  /home/rene/software/aspect/build/aspect:
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157:     dealii::TrilinosWrappers::PreconditionBase>::vmult(dealii::TrilinosWrappers::MPI::BlockVector&,
157:     dealii::TrilinosWrappers::MPI::BlockVector const&) const
157:     #2  /home/rene/software/aspect/build/aspect: void
157:     dealii::SolverFGMRES<dealii::TrilinosWrappers::MPI::BlockVector>::solve<aspect::internal::StokesBlock,
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157:     dealii::TrilinosWrappers::PreconditionBase>
157:     >(aspect::internal::StokesBlock const&,
157:     dealii::TrilinosWrappers::MPI::BlockVector&,
157:     dealii::TrilinosWrappers::MPI::BlockVector const&,
157:     aspect::internal::BlockSchurPreconditioner<dealii::TrilinosWrappers::PreconditionAMG,
157:     dealii::TrilinosWrappers::PreconditionBase> const&)
157:     #3  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_stokes()
157:     #4  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::assemble_and_solve_stokes(double const&,
157:     double*)
157:     #5  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_single_advection_single_stokes()
157:     #6  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::solve_timestep()
157:     #7  /home/rene/software/aspect/build/aspect:
157:     aspect::Simulator<2>::run()
157:     #8  /home/rene/software/aspect/build/aspect: void
157:     run_simulator<2>(std::__cxx11::basic_string<char,
157:     std::char_traits<char>, std::allocator<char> > const&,
157:     std::__cxx11::basic_string<char, std::char_traits<char>,
157:     std::allocator<char> > const&, bool, bool, bool)
157:     #9  /home/rene/software/aspect/build/aspect: main
157:     --------------------------------------------------------
157:     
157: 
157: Stacktrace:
157: -----------
157: #0  /home/rene/software/aspect/build/aspect: aspect::Utilities::throw_linear_solver_failure_exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<dealii::SolverControl, std::allocator<dealii::SolverControl> > const&, std::exception const&, ompi_communicator_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
157: #1  /home/rene/software/aspect/build/aspect: aspect::Simulator<2>::solve_stokes()
157: #2  /home/rene/software/aspect/build/aspect: aspect::Simulator<2>::assemble_and_solve_stokes(double const&, double*)
157: #3  /home/rene/software/aspect/build/aspect: aspect::Simulator<2>::solve_single_advection_single_stokes()
157: #4  /home/rene/software/aspect/build/aspect: aspect::Simulator<2>::solve_timestep()
157: #5  /home/rene/software/aspect/build/aspect: aspect::Simulator<2>::run()
157: #6  /home/rene/software/aspect/build/aspect: void run_simulator<2>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
157: #7  /home/rene/software/aspect/build/aspect: main
157: --------------------------------------------------------
157: 
157: Aborting!
157: ----------------------------------------------------
157: --------------------------------------------------------------------------
157: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
157: with errorcode 1.
157: 
157: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
157: You may or may not see output from other processes, depending on
157: exactly when Open MPI kills them.
157: --------------------------------------------------------------------------
157: 
157: CMake Error at /home/rene/software/aspect/tests/run_test.cmake:14 (MESSAGE):
157:   *** Test aborted.
157: 
157: 
1/1 Test #157: checkpoint_07_enable_free_surface_resume ...***Failed   33.94 sec

@bangerth
Copy link
Contributor

There has been substantial churn in the GMRES implementation in deal.II. Are you able to narrow down which patch to deal.II might have caused this?

@kronbichler FYI -- perhaps related to your patches, perhaps not.

@kronbichler
Copy link
Contributor

kronbichler commented Mar 26, 2024

It is quite likely related to #16760, that's at least what I would take first. Let me try to have a look at the failing test here.

@kronbichler
Copy link
Contributor

kronbichler commented Mar 26, 2024

I cannot reproduce this on any of my systems. I tried both with gdb and valgrind to see if any floating point exception or similar error would be triggered, but could not see any. The test still fails for me, here is the output I have:

157: Built target checkpoint_07_enable_free_surface_resume
157: Generating output-checkpoint_07_enable_free_surface_resume/screen-output
157: Generating output-checkpoint_07_enable_free_surface_resume/statistics.notime
157: Generating output-checkpoint_07_enable_free_surface_resume/statistics.cmp.notime
157: Generating output-checkpoint_07_enable_free_surface_resume/statistics.diff
157: ******* Error during diffing output results for checkpoint_07_enable_free_surface_resume/statistics
157: ******* Results are stored in /home/martin/Work/ParallelFlowSoftware/aspect/aspect/build/tests/output-checkpoint_07_enable_free_surface_resume/statistics.diff.failed
157: ******* Check /home/martin/Work/ParallelFlowSoftware/aspect/aspect/build/tests/output-checkpoint_07_enable_free_surface_resume/statistics /home/martin/Work/ParallelFlowSoftware/aspect/aspect/tests/checkpoint_07_enable_free_surface_resume/statistics
157: ******* First 50 of 191 lines of diffs are:
157: ----------------
157: ##17      <== 0 0.000000000000e+00 0.000000000000e+00 768 7392 3264 0  56  59 174 output-checkpoint_07_enable_free_surface_create/solution/solution-00000 1.59900000e+03 1.60013520e+03 1.60100000e+03 1.56138836e-02 2.93831873e-02 
157: ##17      ==> 0 0.000000000000e+00 0.000000000000e+00 768 7392 3264 0  55  56 165 output-checkpoint_07_enable_free_surface_create/solution/solution-00000 1.59900000e+03 1.60013520e+03 1.60100000e+03 1.56139064e-02 2.93850329e-02 
157: 
157: ##17      #:8   <== 56
157: ##17      #:8   ==> 55
157: @ Absolute error = 1.0000000000e+0, Relative error = 1.8181818182e-2
157: ##17      #:9   <== 59
157: ##17      #:9   ==> 56
157: @ Absolute error = 3.0000000000e+0, Relative error = 5.3571428571e-2
157: ##17      #:10  <== 174
157: ##17      #:10  ==> 165
157: @ Absolute error = 9.0000000000e+0, Relative error = 5.4545454545e-2
157: ##17      #:15  <== 1.56138836e-02
157: ##17      #:15  ==> 1.56139064e-02
157: @ Absolute error = 2.2800000000e-8, Relative error = 1.4602388864e-6
157: ##17      #:16  <== 2.93831873e-02
157: ##17      #:16  ==> 2.93850329e-02
157: @ Absolute error = 1.8456000000e-6, Relative error = 6.2811429582e-5
157: ----------------
157: ##18      <== 1 4.007228995962e+06 4.007228995962e+06 768 7392 3264 7  31  33  99 output-checkpoint_07_enable_free_surface_create/solution/solution-00001 1.59900419e+03 1.60013507e+03 1.60099843e+03 1.55910246e-02 2.92839004e-02 
157: ##18      ==> 1 4.007251868092e+06 4.007251868092e+06 768 7392 3264 7  31  32  96 output-checkpoint_07_enable_free_surface_create/solution/solution-00001 1.59900419e+03 1.60013507e+03 1.60099843e+03 1.55910240e-02 2.92839208e-02 
157: 
157: ##18      #:2   <== 4.007228995962e+06
157: ##18      #:2   ==> 4.007251868092e+06
157: @ Absolute error = 2.2872130000e+1, Relative error = 5.7077172338e-6
157: ##18      #:3   <== 4.007228995962e+06
157: ##18      #:3   ==> 4.007251868092e+06
157: @ Absolute error = 2.2872130000e+1, Relative error = 5.7077172338e-6
157: ##18      #:9   <== 33
157: ##18      #:9   ==> 32
157: @ Absolute error = 1.0000000000e+0, Relative error = 3.1250000000e-2
157: ##18      #:10  <== 99
157: ##18      #:10  ==> 96
157: @ Absolute error = 3.0000000000e+0, Relative error = 3.1250000000e-2
157: ##18      #:15  <== 1.55910246e-02
157: ##18      #:15  ==> 1.55910240e-02
157: @ Absolute error = 6.0000000000e-10, Relative error = 3.8483681380e-8
157: ##18      #:16  <== 2.92839004e-02
157: ##18      #:16  ==> 2.92839208e-02
157: @ Absolute error = 2.0400000000e-8, Relative error = 6.9662851332e-7
157: ----------------
157: ##19      <== 2 8.026868927495e+06 4.019639931534e+06 768 7392 3264 8  24  26  78 output-checkpoint_07_enable_free_surface_create/solution/solution-00002 1.59899925e+03 1.60013483e+03 1.60100022e+03 1.55663999e-02 2.91912298e-02 
157: ##19      ==> 2 8.026892072502e+06 4.019640204410e+06 768 7392 3264 8  25  26  78 output-checkpoint_07_enable_free_surface_create/solution/solution-00002 1.59899925e+03 1.60013483e+03 1.60100022e+03 1.55663980e-02 2.91912163e-02 
157: 
157: ##19      #:2   <== 8.026868927495e+06
157: ##19      #:2   ==> 8.026892072502e+06
157: @ Absolute error = 2.3145007000e+1, Relative error = 2.8834415024e-6
157: ##19      #:3   <== 4.019639931534e+06
157: ##19      #:3   ==> 4.019640204410e+06
157: 

and the runtime output is

$ ./aspect /home/martin/Work/ParallelFlowSoftware/aspect/aspect/tests/checkpoint_07_enable_free_surface_resume.prm 
-----------------------------------------------------------------------------
--                             This is ASPECT                              --
-- The Advanced Solver for Planetary Evolution, Convection, and Tectonics. --
-----------------------------------------------------------------------------
--     . version 2.6.0-pre (main, 939cf6e6a)
--     . using deal.II 9.6.0-pre (simplify_fe_evaluation, 8718988632)
--     .       with 32 bit indices and vectorization level 2 (256 bits)
--     . using Trilinos 14.4.0
--     . using p4est 2.8.0
--     . using Geodynamic World Builder 0.5.0
--     . running in DEBUG mode
--     . running with 1 MPI process
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.6.0-pre&sha=939cf6e6a&src=code
-----------------------------------------------------------------------------
*** Resuming from snapshot!

Number of active cells: 768 (on 4 levels)
Number of degrees of freedom: 10,656 (6,528+864+3,264)

Number of mesh deformation degrees of freedom: 1,728
   Solving mesh displacement system... 0 iterations.
*** Timestep 8:  t=3e+07 years, dt=1.73037e+06 years
   Solving mesh displacement system... 1 iterations.
   Solving temperature system... 6 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 200+19 iterations.

   Postprocessing:
     Writing graphical output: output/solution/solution-00008
     Temperature min/avg/max:  1599 K, 1600 K, 1601 K
     RMS, max velocity:        0.0153 m/year, 0.0281 m/year

Termination requested by criterion: end time


+----------------------------------------------+------------+------------+
| Total wallclock time elapsed since start     |     0.598s |            |
|                                              |            |            |
| Section                          | no. calls |  wall time | % of total |
+----------------------------------+-----------+------------+------------+
| Assemble Stokes system           |         1 |    0.0384s |       6.4% |
| Assemble temperature system      |         1 |    0.0473s |       7.9% |
| Build Stokes preconditioner      |         1 |    0.0211s |       3.5% |
| Build temperature preconditioner |         1 |   0.00202s |      0.34% |
| Initialization                   |         1 |    0.0598s |        10% |
| Mesh deformation                 |         1 |    0.0176s |       2.9% |
| Mesh deformation initialize      |         1 |    0.0153s |       2.6% |
| Postprocessing                   |         1 |    0.0188s |       3.1% |
| Setup dof systems                |         1 |    0.0222s |       3.7% |
| Setup matrices                   |         1 |    0.0124s |       2.1% |
| Solve Stokes system              |         1 |     0.341s |        57% |
| Solve temperature system         |         1 |   0.00097s |      0.16% |
+----------------------------------+-----------+------------+------------+

-- Total wallclock time elapsed including restarts: 2s
-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.6.0-pre&sha=939cf6e6a&src=code
-----------------------------------------------------------------------------

@kronbichler
Copy link
Contributor

As an additional comment, as #5607 (comment) mentions GMG, I just wanted to note that I see that AMG gets used on my system. It looks I am running the test in the wrong way that does not trigger the problem?

@gassmoeller
Copy link
Member Author

Thanks for looking into this @kronbichler. I fixed the test in #5608 by adjusting the solver tolerances so that it doesnt fail anymore. If you want to see the failing test you need to check a version before that PR (e.g. 1671a65 or earlier). As an aside we are using an FGMRES solver for the part that fails in this test, so any change in deal.II that only affects GMRES is unlikely to be the reason (it looks like #16760 may only affect GMRES?).

As I discussed with Wolfgang yesterday (but after he wrote the comment), the fact that the test is failing now may be caused by a slightly different "path" the solver is taking in terms of residual reduction. The test was always hard to solve, because we add a non-symmetric stabilization term to our matrix for this test, but inside the preconditioner use a CG solver to solve for an approximate inverse of one block of the matrix. Usually this non-symmetric term is small, but I would guess it can be large for this test, because of the nature of the test. It is inside this CG solver that the test was failing. We determined this likely requires us to change that solver, not necessarily that anything is wrong with the outer FGMRES solver. So maybe this whole issue just serves as a reminder that "something" changed inside the FGMRES solver, not necessarily that is is any worse (or better) than before.

@kronbichler
Copy link
Contributor

You are right, it seems that I forgot FGMRES (that I also wanted to convert). Anyway, thank you for the explanation, I will look a bit more to see if there are problems, I absolutely do not want to cause too much churn in these regards (but I want algorithms to be better).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants