Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I do not know how to tune the amgcl to make solver converged #272

Open
haoxiangNtu opened this issue Jan 17, 2024 · 3 comments
Open

I do not know how to tune the amgcl to make solver converged #272

haoxiangNtu opened this issue Jan 17, 2024 · 3 comments

Comments

@haoxiangNtu
Copy link

Hello Sir, I want to express my heartfelt gratitude for your outstanding work in creating AMGCL!

I have encountered an issue related to the convergence of the linear system matrix A_hand, which is causing slow convergence. A_hand, along with vector f_hand, is computed from a 3D tetrahedral mesh used for Finite Element Method (FEM) object simulation. A_hand is a matrix of approximately 176328 x 176328 dimensions. I have also followed the recommendations mentioned in issue #175. Specifically, I have used the provided main_2.cpp, which employs a block-structure with 3x3 blocks and sets precond.coarsening.aggr.block_size=3. I have also experimented with other meshes containing A_shoe, which has only 48033 unknowns. However, it still requires an excessive number of iterations to achieve a relatively better result, as indicated below.

I have three questions here:

  1. Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver. To provide more insight, I have included a matrix portrait below.

  2. How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).

  3. If i could get a good initial solution for result vector x, how to specify it in the code? (Since here im doing an project that need to design material distribution in an object (update material in each iteration). I think the solution computed from last iteration would help for converge in the next iteration.)

Attached, you will find the main.cpp and the results. The matrix can be downloaded from the provided link.

I would be immensely grateful for any suggestions you may offer regarding optimal parameters and the combination of options to expedite the solution.

Thank you very much for your time, Sir.

result of A_hand
Matrix A has 176328 rows and 176328 columns.
Matrix A is rowmajor
Solver

Type: BiCGStab
Unknowns: 58776
Memory footprint: 9.42 M

Preconditioner

Number of levels: 3
Operator complexity: 1.35
Grid complexity: 1.08
Memory footprint: 113.82 M

level unknowns nonzeros memory

0        58776         676276     91.06 M (74.17%)
1         4537         219611     19.71 M (24.08%)
2          299          15945      3.05 M ( 1.75%)

500 4.39234e-06

[Profile: 20.294 s] (100.00%)
[ self: 7.117 s] ( 35.07%)
[ setup: 0.149 s] ( 0.73%)
[ solve: 13.028 s] ( 64.20%)

result of A_Shoe
Matrix A has 48033 rows and 48033 columns.
Matrix A is rowmajor
Solver

Type: BiCGStab
Unknowns: 16011
Memory footprint: 2.57 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.40 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1282    348.96 K ( 0.64%)

399 7.81433e-09

[Profile: 3.833 s] (100.00%)
[ self: 1.882 s] ( 49.10%)
[ setup: 0.022 s] ( 0.58%)
[ solve: 1.929 s] ( 50.32%)

Matrix portrait of A_hand
HandSparse
HandSparseValue

Matrix portrait of A_shoe
ShoeSparse
ShoeSparseValue

@ddemidov
Copy link
Owner

Hello,

Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver.

I could not improve on the convergence results you show. You could try to use grid coordinates to generate near-nullspace vectors (see https://amgcl.readthedocs.io/en/latest/tutorial/Nullspace.html).

How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).

The CUDA backend does not support block-valued solution, but you could use VexCL backend with CUDA as VexCL backend. There is an example of using VexCL backend with block-valued systems here: https://amgcl.readthedocs.io/en/latest/tutorial/Serena.html

If i could get a good initial solution for result vector x, how to specify it in the code?

You should put it into the solution vector. Vector x is treated as both input (for initial approximation) and output (for the result).

@haoxiangNtu
Copy link
Author

Hello Sir,

I have been studying and trying many options in AMGCL for the past few days, but still, there has been no progress. After using the nullspace, I found that the improvement in convergence is not significant. I'm not sure if this is because the C matrix I constructed is incorrect. My stiffness matrix is a general tetrahedral stiffness matrix, and I have removed the rows and columns corresponding to the fixed points. This means that I have also removed the coordinates of the fixed points in the C matrix (this matrix is also uploaded in the previous link), and I am not sure if this approach is correct. Additionally, I tried using ns_search to find approximate nullspace, but the results are still not satisfactory. Here are some simple results from my tests: currently, using the CG solver with the block option gives the best result, but it takes about 1.6 seconds. I have also implemented the VexCL backend with block-valued systems with the CUDA option, but it also takes about 0.62 seconds to compute. However, when I tested with the Intel MKL's Pardiso library, the results only took about 0.608 seconds. Do you have any further suggestions?

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.maxiter=1000 -b3

Solver

Type: BiCGStab
Unknowns: 16011
Memory footprint: 2.57 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 290
Error: 8.76707e-09

[Profile: 3.483 s] (100.00%)
[ reading: 1.579 s] ( 45.33%)
[ setup: 0.019 s] ( 0.54%)
[ solve: 1.885 s] ( 54.11%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3

Solver

Type: CG
Unknowns: 16011
Memory footprint: 1.47 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 513
Error: 9.83891e-09

[Profile: 3.301 s] (100.00%)
[ reading: 1.581 s] ( 47.89%)
[ setup: 0.025 s] ( 0.75%)
[ solve: 1.695 s] ( 51.34%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 890
Error: 9.74222e-09

[Profile: 5.619 s] (100.00%)
[ reading: 1.612 s] ( 28.69%)
[ setup: 0.080 s] ( 1.43%)
[ solve: 3.926 s] ( 69.87%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 471
Error: 9.75176e-09

[Profile: 5.453 s] (100.00%)
[ reading: 1.634 s] ( 29.96%)
[ setup: 0.097 s] ( 1.77%)
[ solve: 3.722 s] ( 68.26%)

./ns_search -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -n6 -o N6.mtx

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -N N6.mtx

Iterations: 235
Error: 9.75834e-09

[Profile: 3.683 s] (100.00%)
[ reading: 1.780 s] ( 48.32%)
[ setup: 0.101 s] ( 2.74%)
[ solve: 1.802 s] ( 48.93%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -N N6.mtx
Iterations: 459
Error: 9.82758e-09

[Profile: 3.762 s] (100.00%)
[ reading: 1.720 s] ( 45.73%)
[ setup: 0.076 s] ( 2.02%)
[ solve: 1.964 s] ( 52.21%)

VexCL backend with block-valued systems with the CUDA option:

  1. NVIDIA GeForce RTX 2080 Ti

Matrix A_0 has 48033 rows and 48033 columns.
Matrix A_0 is rowmajor
Matrix A_0 has 48033
Matrix Serena.mtx: 48033x48033
Solver

Type: CG
Unknowns: 16011
Memory footprint: 1.47 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 13.96 M

level unknowns nonzeros memory

0        16011         177785     12.51 M (88.25%)
1         1206          22398      1.27 M (11.12%)
2           86           1272    175.07 K ( 0.63%)

Iters: 461
Error: 9.98164e-08

[Serena (VexCL): 2.598 s] (100.00%)
[ GPU matrix: 0.013 s] ( 0.48%)
[ read: 1.916 s] ( 73.75%)
[ setup: 0.047 s] ( 1.82%)
[ solve: 0.622 s] ( 23.93%)

@ddemidov
Copy link
Owner

It looks like the problem is just too difficult to solve with a simple iterative approach. Intel Pardiso is a direct solver, so it does not have this problem. Direct solvers usually do not scale as good as iterative ones, but if you do not need to solve larger systems, and are satisfied with the Pardiso performance, then may be the direct solver is your best approach here.

Otherwise, you probably need to research on the methods to solve your particular problem. I am sorry I could not be of more help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants