Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoomerAMG performance if some basis functions have large support #968

Open
craffael opened this issue Sep 22, 2023 · 12 comments
Open

BoomerAMG performance if some basis functions have large support #968

craffael opened this issue Sep 22, 2023 · 12 comments
Assignees

Comments

@craffael
Copy link

Dear all

I'm trying to use BoomerAMG as a preconditioner to solve a 3D laplace problem which is assembled by a finite element code.
I have the particular problem, that a small number of the basis functions have very large support (the support is essentially a 2D surface). The number of these "large-support" basis functions is independent of the overall mesh size and depends only on the topology.

For small problems BoomerAMG works quite reasonable, but as soon as I run it with a few million dofs, the number of CG steps drastically increases and so BoomerAMG doesn't scale very well anymore.

Do you have any idea how I could tune BoomerAMG for such a problem?

At the moment I use mostly the default options, but with the chebyshev relaxation method and with one level of aggressive coarsening.

I would be very grateful for some input.

Thanks
Raffael

@victorapm
Copy link
Contributor

Hi @craffael, this sounds like an interesting application. Is your code open-source or would you be able to share some matrices? I could take a closer look.

@craffael
Copy link
Author

craffael commented Sep 29, 2023

@victorapm I've setup a small test program which shows the issue. Along with the test program I provide you matrices for four different mesh sizes (the mesh size is halfed from one mesh to the next). Each mesh has 5 "special basis functions" with large support. If we leave away the special basis by removing the last 5 rows and columns (mesh file ending in _without), then the number of CG iterations decreases drastically.

Here's the number of CG iterations that I measured:

Mesh #unknowns (including special basis fcn) #pcg iterations with #pcg iterations without
lhs3 10'743 40 12
lhs4 69'970 41 13
lhs5 497'920 76 14
lhs6 3'700'686 109 16

We can see that there are always more #pcg iterations with the special basis funcitons than without and that the #pcg iterations increases drastically if the number of unknowns is above a threshold.

If we increase the number of special basis functions to 81 (another case), it's much worse:

Mesh #unknowns #pcg iterations with #pcg iterations without
case2_3 13'581 757 12
case2_4 80'139 769 13
case2_5 562'570 1'277 14

Here's the code for the program which showcases this:

#include <HYPRE.h>
#include <HYPRE_IJ_mv.h>
#include <HYPRE_parcsr_mv.h>
#include <HYPRE_krylov.h>
#include <HYPRE_parcsr_ls.h>

#include <_hypre_parcsr_mv.h>
#include <IJ_matrix.h>
#include <assert.h>
#include <stdio.h>

int main(int argc, char*argv[]) {
  HYPRE_Init();

  char buffer[100];

  // Read matrix:
  HYPRE_IJMatrix matrixij = NULL;
  assert(argc == 2);
  HYPRE_IJMatrixRead(argv[1], 0, HYPRE_PARCSR, &matrixij);
  HYPRE_DescribeError(HYPRE_GetError(), buffer);
  printf("Error: %s\n", buffer);
  assert(HYPRE_GetError() == 0);

  // make ParCsr matrix:
  HYPRE_ParCSRMatrix parcsr = NULL;
  HYPRE_IJMatrixGetObject(matrixij, (void**) &parcsr);
  assert(HYPRE_GetError() == 0);

  // setup rhs (vector of ones):
  HYPRE_Int n = matrixij->row_partitioning[1];
  printf("N = %d\n", n);
  HYPRE_ParVector rhs = NULL;
  HYPRE_ParVectorCreate(0, n, NULL, &rhs);
  assert(HYPRE_GetError() == 0);
  HYPRE_ParVectorInitialize(rhs);
  assert(HYPRE_GetError() == 0);
  HYPRE_ParVectorSetConstantValues(rhs, 1.0);
  assert(HYPRE_GetError() == 0);

  // setup solution vector
  HYPRE_ParVector x = NULL;
  HYPRE_ParVectorCreate(0, n, NULL, &x);
  assert(HYPRE_GetError() == 0);
  HYPRE_ParVectorInitialize(x);
  assert(HYPRE_GetError() == 0);
  HYPRE_ParVectorSetConstantValues(x, 0.0);
  assert(HYPRE_GetError() == 0);

  // Create BoomerAMG
  HYPRE_Solver boomeramg = NULL;
  HYPRE_BoomerAMGCreate(&boomeramg);
  assert(HYPRE_GetError() == 0);

  // set relax type to chebyshev
  HYPRE_BoomerAMGSetRelaxType(boomeramg, 16);
  assert(HYPRE_GetError() == 0);

  // set num aggressive levels:
  HYPRE_BoomerAMGSetAggNumLevels(boomeramg, 1);
  assert(HYPRE_GetError() == 0);

  //HYPRE_BoomerAMGSetPrintLevel(boomeramg, 3);
  //assert(HYPRE_GetError() == 0);

  HYPRE_BoomerAMGSetStrongThreshold(boomeramg, 0.5);
  assert(HYPRE_GetError() == 0);

  HYPRE_BoomerAMGSetTol(boomeramg, 0.0);
  assert(HYPRE_GetError() == 0);

  HYPRE_BoomerAMGSetMaxIter(boomeramg, 1);
  assert(HYPRE_GetError() == 0);

  //// setup boomeramg:
  HYPRE_BoomerAMGSetup(boomeramg, parcsr, NULL, NULL);
  assert(HYPRE_GetError() == 0);  

  // setup pcg:
  HYPRE_Solver pcg;
  HYPRE_ParCSRPCGCreate(0, &pcg);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetTol(pcg, 1e-6);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetMaxIter(pcg, 1000);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetLogging(pcg, 2);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetPrintLevel(pcg, 2);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetPrecond(pcg, HYPRE_BoomerAMGSolve, HYPRE_BoomerAMGSetup, boomeramg);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSetup(pcg, parcsr, rhs, x);
  assert(HYPRE_GetError() == 0);

  HYPRE_ParCSRPCGSolve(pcg, parcsr, rhs, x);
  assert(HYPRE_GetError() == 0);
  
  
  HYPRE_DescribeError(HYPRE_GetError(), buffer);
  printf("Error: %s\n", buffer);

  HYPRE_Finalize();
}

And here are the mesh files (I've omitted lhs6 because it's very large)

@victorapm
Copy link
Contributor

@craffael Thanks for the quick response and providing the data! Please, allow me a couple of weeks to take a look.

Lastly, are there any publications detailing the model that you are solving or is this part of an on-going work?

Thanks!

@craffael
Copy link
Author

@victorapm It would be great if you could have a look. I'm still a bit puzzled by the fact that a few additional basis functions can trouble AMG that much.

I've not published anything about this problem. But it is essentially a scalar valued magnetostatic problem in a topologically non-trivial domain without any source current. Our equations are as follows:
image

Because of the first equation, we know that H is a gradient field + some "cohomology basis functions". For example, if our domain was a donut, we would introduce a cut of the donut. The corresponding cohomology basis function would just be a scalar field but with a uniform jump of 1 across the cut surface:
image

Therefore we can write
image

with T_j being the cohomology basis functions, \phi_i being normal lagrangian hat functions and \alpha_i, \beta_j being normal coefficients. (the gradient with the tilde is only well defined inside the elements because of the jump. Strictly speaking it acts on functions from the H1 space broken along the cut surface). Then we make a usual finite element galerkin formulation with this representation of H to solve for the second equation div(\mu H) = 0.

@ulrikeyang
Copy link
Contributor

ulrikeyang commented Sep 30, 2023 via email

@victorapm
Copy link
Contributor

victorapm commented Sep 30, 2023

Hi @craffael,

thanks for providing more details! This is helpful! We might have to develop/update a preconditioner in hypre for solving a problem like yours, so I see a nice research opportunity here. We could have a meeting after I take an initial look at your problem.

@ulrikeyang, BoomerAMG-PCG was being used here and lead to bad convergence. I might be wrong, but it seems AMS is not suitable here since the discretization strategy is continuous galerkin (CG) instead of Nedelec FEM as expected by AMS.

@liruipeng
Copy link
Contributor

I was wondering if we should keep those 5 or 81 rows/columns of the matrix from coarsening in AMG. Or we can do a Schur complement approach wrt to them. I believe in hypre we can mark them to not be coarsened.

@victorapm
Copy link
Contributor

That's one of the things I'm testing

@victorapm victorapm self-assigned this Sep 30, 2023
@craffael
Copy link
Author

craffael commented Oct 2, 2023

Hi all
a few comments:

  • AMS/ADS are not applicable, it is a scalar valued Laplace problem.
  • I've already tried various (in)complete Schur complement approaches, but they are quite expensive and I only managed to get small improvements.

@liruipeng
Copy link
Contributor

Hi all a few comments:

  • AMS/ADS are not applicable, it is a scalar valued Laplace problem.
  • I've already tried various (in)complete Schur complement approaches, but they are quite expensive and I only managed to get small improvements.

Thanks for the updates. Can you provide the details of the Schur complement approach that you tried? Thanks!

@craffael
Copy link
Author

craffael commented Oct 3, 2023

@liruipeng I try to summarize a bit what I've tried. For the discussion lets assume the system matrix has the following block structure:
grafik
Where C is small n x n matrix, built from the n "special basis functions". Then the normal Schur complement would be S = C - B^t A^-1 B. The schur complement S can then be inverted with a direct solver. Of course you can form this schur complement explicitly, but then you have to invert A n times which is not very efficient.

So the next thing I tried was to form an approximate Schur complement
grafik
where \tilde{A}^-1 is an approximate inverse of A. E.g. one or more cycles of BoomerAMG or CG iterations with BoomerAMG as a preconditioner. The approximate schur complement matrix \tilde{S} can then be inverted with a direct solver and used as preconditioner for the overall system. This was not very successful (I don't recall the exact numbers).

Alternatively, we can also swap the roles fo A and C and consider the schur complement S2 = A - B C^-1 B^T. The idea is not to explicitly form S2, but to use it in a PCG Iteration with some preconditioner. To Precondition S2 I've used just BoomerAMG of A. The number of PCG iterations still increases, similarly to when BoomerAMG is applied to the whole system. I think the problem here is that "BoomerAMG of A" is not a good preconditioner for S2.

Finally I've also tried block jacobi approaches, i.e. a preconditioner of the form
grafik

@victorapm
Copy link
Contributor

@craffael Thanks for the additional info! I'm planning to try other alternatives as well and I'll let you know once I do it (in my free time)

@liruipeng Thanks for the ideas! Let me know if you want to look at this as well so we don't duplicate efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants