Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve issues related to variable point-block Jacobi for fault preconditioner #677

Open
3 of 6 tasks
baagaard-usgs opened this issue Jan 24, 2024 · 20 comments
Open
3 of 6 tasks
Assignees
Labels
bug General bug

Comments

@baagaard-usgs
Copy link
Contributor

baagaard-usgs commented Jan 24, 2024

  • GAMG setting (--petsc.mg_levels_3_pc_type=vpbjacobi) is basis function dependent; need to be able to set this automatically
  • Turning on refinement results in SNES iterations for a linear problem; error in Jacobian?
  • Need to unpermute values for output
  • Update parallel fault PC settings to use parmetis with edge weighting [needed until we have parallel mesh loading]
  • DOCS: Update example output
  • DOCS: Update list of fault PC settings [parmetis with edge weighting; temporary]
@baagaard-usgs baagaard-usgs added the bug General bug label Jan 24, 2024
@baagaard-usgs baagaard-usgs added this to the Release v4.1.0 milestone Jan 24, 2024
@baagaard-usgs
Copy link
Contributor Author

Description

I pulled PETSc and now SNES fails to converge.

How to reproduce

PETSc branch: knepley/fix-plex-io-section-perm
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc

cd $BUILD/tests/fullscale/linearelasticity/faults-2d
make export-data
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
    0 SNES Function norm 4.265560370677e-03
      0 KSP preconditioned resid norm 1.908036347910e-02 true resid norm 4.265560370677e-03 ||r(i)||/||b|| 1.000000000000e+00
      1 KSP preconditioned resid norm 4.967562491477e-03 true resid norm 3.077158244315e-03 ||r(i)||/||b|| 7.213960129290e-01
      2 KSP preconditioned resid norm 3.598311696003e-03 true resid norm 2.619025748517e-03 ||r(i)||/||b|| 6.139933609945e-01
      3 KSP preconditioned resid norm 1.430059417360e-03 true resid norm 1.704501179051e-03 ||r(i)||/||b|| 3.995960743560e-01
      4 KSP preconditioned resid norm 7.351269360768e-04 true resid norm 1.225965715858e-03 ||r(i)||/||b|| 2.874102367150e-01
      5 KSP preconditioned resid norm 7.230122730906e-04 true resid norm 1.290372237623e-03 ||r(i)||/||b|| 3.025094302951e-01
      6 KSP preconditioned resid norm 4.159075140772e-04 true resid norm 6.686181343840e-04 ||r(i)||/||b|| 1.567480181456e-01
      7 KSP preconditioned resid norm 1.367757561316e-04 true resid norm 4.039493407656e-04 ||r(i)||/||b|| 9.470018137417e-02
      8 KSP preconditioned resid norm 7.158193483169e-05 true resid norm 1.827279480449e-04 ||r(i)||/||b|| 4.283797019989e-02
      9 KSP preconditioned resid norm 3.905357446809e-05 true resid norm 7.282293839440e-05 ||r(i)||/||b|| 1.707230283154e-02
     10 KSP preconditioned resid norm 9.668164625721e-06 true resid norm 1.060892638182e-05 ||r(i)||/||b|| 2.487111999340e-03
     11 KSP preconditioned resid norm 1.785936355636e-06 true resid norm 1.877012745564e-06 ||r(i)||/||b|| 4.400389591171e-04
     12 KSP preconditioned resid norm 3.965116756948e-07 true resid norm 5.489373224082e-07 ||r(i)||/||b|| 1.286905528713e-04
     13 KSP preconditioned resid norm 9.375386755773e-08 true resid norm 1.242938071586e-07 ||r(i)||/||b|| 2.913891642773e-05
     14 KSP preconditioned resid norm 1.901533854471e-08 true resid norm 1.973504391365e-08 ||r(i)||/||b|| 4.626600539829e-06
     15 KSP preconditioned resid norm 2.383816936460e-09 true resid norm 3.629644757516e-09 ||r(i)||/||b|| 8.509186231351e-07
     16 KSP preconditioned resid norm 4.598716317560e-10 true resid norm 5.357437037358e-10 ||r(i)||/||b|| 1.255974965021e-07
     17 KSP preconditioned resid norm 6.822558674688e-11 true resid norm 1.035253125880e-10 ||r(i)||/||b|| 2.427003807043e-08
     18 KSP preconditioned resid norm 1.714498392166e-11 true resid norm 2.063303492863e-11 ||r(i)||/||b|| 4.837121769619e-09
     19 KSP preconditioned resid norm 2.318581686522e-12 true resid norm 2.023190303223e-12 ||r(i)||/||b|| 4.743082098032e-10
     20 KSP preconditioned resid norm 2.531897993993e-13 true resid norm 3.444310532404e-13 ||r(i)||/||b|| 8.074696483214e-11
     21 KSP preconditioned resid norm 4.083948365503e-14 true resid norm 4.900041069684e-14 ||r(i)||/||b|| 1.148744981637e-11
     22 KSP preconditioned resid norm 5.832103009695e-15 true resid norm 7.129245206412e-15 ||r(i)||/||b|| 1.671350206510e-12
     23 KSP preconditioned resid norm 7.275292514621e-16 true resid norm 1.083637092917e-15 ||r(i)||/||b|| 2.540433140664e-13
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: SNESSolve has not converged due to DIVERGED_LINE_SEARCH. Suggest running with -snes_linesearch_monitor

@knepley
Copy link
Contributor

knepley commented Feb 11, 2024

@baagaard-usgs I think this is fixed. I have force pushed the PETSc branch. The problem was that the residual is cloned from the solution, and clone did not copy the rordering flags. Ugh. Putting in a new flag is hard. Let me know if this works for you.

@baagaard-usgs
Copy link
Contributor Author

The residual and Jacobian are now consistent. The output is still scrambled.

@knepley
Copy link
Contributor

knepley commented Feb 11, 2024

How do I see the output from the minmesh example I am running?

pylith pylithapp_m
inmesh.cfg minmesh_noslip.cfg minmesh_noslip_quad.cfg ../../../../../pylith-git/share/settings/solver_faultpc.cfg --pets
c.snes_monitor --petsc.ksp_monitor_true_residual --petsc.snes_view --petsc.dm_ignore_perm_output

@baagaard-usgs
Copy link
Contributor Author

The Lagrange multipliers should be nearly 0 and the displacements should +1 and -1 on each side of the fault.

h5dump output/minmesh_noslip_quad-domain.h5

@knepley
Copy link
Contributor

knepley commented Feb 11, 2024

Only the displacement outputs for me. These look like they are in the original order

   GROUP "vertex_fields" {
      DATASET "displacement" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 1, 8, 2 ) / ( H5S_UNLIMITED, 8, 2 ) }
         DATA {
         (0,0,0): 0, 1,
         (0,1,0): 8.87995e-17, -1.52413e-16,
         (0,2,0): 7.86351e-17, 1.72848e-16,
         (0,3,0): -8.41938e-17, -1.52837e-16,
         (0,4,0): -9.77463e-17, 1.47278e-16,
         (0,5,0): 0, 1,

         (0,6,0): 0, -1,
         (0,7,0): 0, -1
         }
         ATTRIBUTE "timestepping" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
         ATTRIBUTE "vector_field_type" {
            DATATYPE  H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "vector"
            }
         }
      }
   }

@baagaard-usgs
Copy link
Contributor Author

The displacement solution field should be

u(x=-2000) = (0, 1)
u(x=0) = (0, 0)
u(x=+2000) = (0, -1)

Coordinates of vertices               Displacement
(0,0): -2000, 0,                             (0,0,0): 0, 1,                                                  Correct
(1,0): 0, 0,                                      (0,1,0): -1.3022e-17, 2.31458e-16,            Correct
(2,0): 0, 2000,                               (0,2,0): -1.75285e-17, -5.84085e-17,       Correct
(3,0): -2000, 2000,                      (0,3,0): 1.83402e-17, 1.51906e-16,           Should be (0, 1)
(4,0): 2000, 0,                               (0,4,0): 3.69749e-17, -1.26146e-16,        Should be (0, -1)
(5,0): 2000, 2000,                        (0,5,0): 0, 1,                                                Should be (0, -1)
(6,0): 0, 0,                                      (0,6,0): 0, -1,                                              Should be (0, 0)
(7,0): 0, 2000                                 (0,7,0): 0, -1                                               Should be (0, 0)
         
         
         
         
         
         
         
         

@knepley
Copy link
Contributor

knepley commented Feb 12, 2024

I have a fix. However, when I rebuilt PyLith everything started to fail, and I cannot understand why. Here is the beginning of the error messages

 >> ./pylithapp.cfg:62:
 -- pyre.inventory(error)
 -- timedependent.solution_observers.points <- 'pylith.meshio.OutputSolnPoints'
 -- unrecognized property 'timedependent.solution_observers.points'
 >> ./pylithapp.cfg:95:
 -- pyre.inventory(error)
 -- pylithapp.timedependent.materials.mat_xmid.description <- 'Elastic material between xneg and xmid faults'
 -- unknown component 'pylithapp.timedependent.materials.mat_xmid'

@knepley
Copy link
Contributor

knepley commented Feb 12, 2024

Okay, I resolved that. My fix appears to work, but I cannot push to your fork. Here is my diff:

diff --git a/libsrc/pylith/meshio/OutputSubfield.cc b/libsrc/pylith/meshio/OutputSubfield.cc
index acacf4ebb..574a12a18 100644
--- a/libsrc/pylith/meshio/OutputSubfield.cc
+++ b/libsrc/pylith/meshio/OutputSubfield.cc

@@ -22,6 +22,8 @@

 #include <typeinfo> // USES typeid()

+#include "petscdm.h" // USES PetscDM
+
 // ------------------------------------------------------------------------------------------------
 // Constructor
 pylith::meshio::OutputSubfield::OutputSubfield(void) :
@@ -72,6 +74,8 @@ pylith::meshio::OutputSubfield::create(const pylith::topology::Field& field,

     PetscErrorCode err;
     err = DMClone(mesh.getDM(), &subfield->_dm);PYLITH_CHECK_ERROR(err);
+    err = DMReorderSectionSetDefault(subfield->_dm, DM_REORDER_DEFAULT_FALSE);PYLITH_CHECK_ERROR(err);
+    err = DMReorderSectionSetType(subfield->_dm, NULL);PYLITH_CHECK_ERROR(err);
     err = PetscObjectSetName((PetscObject)subfield->_dm, name);PYLITH_CHECK_ERROR(err);

     PetscFE fe = pylith::topology::FieldOps::createFE(subfield->_discretization, subfield->_dm,

Do you have to give me permission on your fork?

@baagaard-usgs
Copy link
Contributor Author

Yes, I believe this is the fix. The fault full-scale tests pass with this fix. I will do more testing to verify completeness.

What is the prospect of getting the mg levels set according to the problem? If we are able to do that, we might have this issue completely wrapped up.

mg_levels_1_pc_type = vpbjacobi

@baagaard-usgs
Copy link
Contributor Author

baagaard-usgs commented Feb 14, 2024

I get a zero pivot in row 0 when I use a basis order of 2. I thought you had fixed the reordering for basis order 2.

PETSc branch: knepley/fix-plex-io-section-perm
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc

How to reproduce

cd examples/strikeslip-2d
pylith step04_varslip.cfg ../../share/settings/solver_faultpc.cfg --petsc.mg_levels_2_pc_type=vpbjacobi

Error message

[0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot
[0]PETSC ERROR: Zero pivot, row 0
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-fieldsplit_displacement_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_displacement_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_lagrange_multiplier_fault_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_lagrange_multiplier_fault_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value) source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_factorization_type value: lower source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_precondition value: selfp source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_scale value: 1.0 source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_type value: schur source: code
[0]PETSC ERROR:   Option left: name:-snes_converged_reason (no value) source: code
-- snip --
[0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_2() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/baij/seq/dgefa2.c:52
[0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal_SeqAIJ() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/aij/seq/aij.c:1840
[0]PETSC ERROR: #3 MatInvertVariableBlockDiagonal() at /Users/baagaard/software/unix/petsc-dev/src/mat/interface/matrix.c:10657
[0]PETSC ERROR: #4 PCSetUp_VPBJacobi_Host() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:232
[0]PETSC ERROR: #5 PCSetUp_VPBJacobi() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:263
[0]PETSC ERROR: #6 PCSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #7 KSPSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #8 KSPSolve_Private() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:832

@baagaard-usgs
Copy link
Contributor Author

@knepley mg_fine* seems to work. I am still getting this error above with basis order 2.

@knepley
Copy link
Contributor

knepley commented Feb 16, 2024

The basis order 2 fix is in knepley/fix-plex-reorder-hybrid
Can you approve https://gitlab.com/petsc/petsc/-/merge_requests/7258 ? Then I can merge that, merge the above fix, and recreate knepley/pylith

@baagaard-usgs
Copy link
Contributor Author

baagaard-usgs commented Feb 16, 2024

I confimed that combining PETSc branches knepley/fix-plex-reorder-hybrid and knepley/fix-plex-io-section-perm resolves the issue with basis order 2.

Will close this issue once PETSc branchknepley/pylith is updated.

@baagaard-usgs
Copy link
Contributor Author

@knepley It looks like we have an issue with using the new fault preconditioner settings in parallel in 3D.

Error message

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
[0]PETSC ERROR: #1 PetscTrFreeDefault() at /Users/baagaard/software/unix/petsc-dev/src/sys/memory/mtr.c:262
[0]PETSC ERROR: #2 DMCreateMatrix_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plex.c:2869
[0]PETSC ERROR: #3 DMCreateMatrix() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:1478
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Nonconforming object sizes
[1]PETSC ERROR: Sum of local block sizes 405 does not equal local size of matrix 450
[1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[1]PETSC ERROR:   Option left: name:-malloc_dump (no value) source: code
[1]PETSC ERROR:   Option left: name:-mg_fine_pc_type value: vpbjacobi source: code
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6  GIT Date: 2023-12-05 20:46:15 +0000
[1]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Tue Feb 20 13:59:48 2024
[1]PETSC ERROR: Configure options --PETSC_ARCH=arch-clang-15.0_debug --with-debugging=1 --with-clanguage=c --with-mpi-compilers=1 --with-shared-[0]PETSC ERROR: #4 SNESSetUpMatrices() at /Users/baagaard/soft
ware/unix/petsc-dev/src/snes/interface/snes.c:768
[0]PETSC ERROR: #5 SNESSetUp_NEWTONLS() at /Users/baagaard/software/unix/petsc-dev/src/snes/impls/ls/ls.c:289
[0]PETSC ERROR: #6 SNESSetUp() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:3297
[0]PETSC ERROR: #7 SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:4718
[0]PETSC ERROR: #8 TSTheta_SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:174
[0]PETSC ERROR: #9 TSStep_Theta() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:224
[0]PETSC ERROR: #10 TSStep() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:3382
[0]PETSC ERROR: #11 TSSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:4016

Steps to reproduce

PETSc branch: main
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc

cd $BUILD/tests/fullscale/linearelasticityfaults-3d
make export-data
pylith twoblocks.cfg twoblocks_hex.cfg --nodes=2

@baagaard-usgs
Copy link
Contributor Author

This is not limited to 3D. It also shows up in 2D, but depends on the number of processes.

@knepley
Copy link
Contributor

knepley commented Mar 14, 2024

The new knepley/pylith should fix this if you run using --petsc.petscpartitioner_type=parmetis --petsc.petscpartitioner_use_edge_weights. This causes partitioning to avoid breaking the domain along a cohesive cell. When this happens, we cannot solve the two sides as a block, and it really hurts the solver.

@baagaard-usgs
Copy link
Contributor Author

@knepley Two faults that intersection (one through-going and one that ends at the intersection) seems to cause an error when permuting the section.

[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Number of points in permutation 9622 does not match chart size 9623
--- snip --
[0]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6  GIT Date: 2023-12-05 20:46:15 +0000
[0]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Thu Mar 21 13:58:21 2024
-- snip --
[0]PETSC ERROR: #1 DMCreateSectionPermutation_Plex_Cohesive() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:504
[0]PETSC ERROR: #2 DMCreateSectionPermutation_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:526
[0]PETSC ERROR: #3 DMCreateSectionPermutation() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4392
[0]PETSC ERROR: #4 DMCreateLocalSection_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexsection.c:611
[0]PETSC ERROR: #5 DMGetLocalSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4292
[0]PETSC ERROR: #6 DMGetSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4243
[0]PETSC ERROR: #7 void pylith::topology::Field::allocate()() at /Users/baagaard/src/cig/pylith/libsrc/pylith/topology/Field.cc:303

How to reproduce

PETSc branch: knepley/feature-partitioner-edge-weights
PyLIth fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc

cd examples/reverse-2d
pylith step06_twofaults_elastic.cfg

@knepley
Copy link
Contributor

knepley commented Mar 27, 2024

@baagaard-usgs I believe this is fixed in https://gitlab.com/petsc/petsc/-/merge_requests/7417

@baagaard-usgs
Copy link
Contributor Author

Fix verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug General bug
Projects
None yet
Development

No branches or pull requests

2 participants