I do not know how to tune the amgcl to make solver converged #272

haoxiangNtu · 2024-01-17T03:52:57Z

Hello Sir, I want to express my heartfelt gratitude for your outstanding work in creating AMGCL!

I have encountered an issue related to the convergence of the linear system matrix A_hand, which is causing slow convergence. A_hand, along with vector f_hand, is computed from a 3D tetrahedral mesh used for Finite Element Method (FEM) object simulation. A_hand is a matrix of approximately 176328 x 176328 dimensions. I have also followed the recommendations mentioned in issue #175. Specifically, I have used the provided main_2.cpp, which employs a block-structure with 3x3 blocks and sets precond.coarsening.aggr.block_size=3. I have also experimented with other meshes containing A_shoe, which has only 48033 unknowns. However, it still requires an excessive number of iterations to achieve a relatively better result, as indicated below.

I have three questions here:

Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver. To provide more insight, I have included a matrix portrait below.
How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).
If i could get a good initial solution for result vector x, how to specify it in the code? (Since here im doing an project that need to design material distribution in an object (update material in each iteration). I think the solution computed from last iteration would help for converge in the next iteration.)

Attached, you will find the main.cpp and the results. The matrix can be downloaded from the provided link.

I would be immensely grateful for any suggestions you may offer regarding optimal parameters and the combination of options to expedite the solution.

Thank you very much for your time, Sir.

result of A_hand
Matrix A has 176328 rows and 176328 columns.
Matrix A is rowmajor
Solver

Type: BiCGStab
Unknowns: 58776
Memory footprint: 9.42 M

Preconditioner

Number of levels: 3
Operator complexity: 1.35
Grid complexity: 1.08
Memory footprint: 113.82 M

level unknowns nonzeros memory

0        58776         676276     91.06 M (74.17%)
1         4537         219611     19.71 M (24.08%)
2          299          15945      3.05 M ( 1.75%)

500 4.39234e-06

[Profile: 20.294 s] (100.00%)
[ self: 7.117 s] ( 35.07%)
[ setup: 0.149 s] ( 0.73%)
[ solve: 13.028 s] ( 64.20%)

result of A_Shoe
Matrix A has 48033 rows and 48033 columns.
Matrix A is rowmajor
Solver

Type: BiCGStab
Unknowns: 16011
Memory footprint: 2.57 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.40 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1282    348.96 K ( 0.64%)

399 7.81433e-09

[Profile: 3.833 s] (100.00%)
[ self: 1.882 s] ( 49.10%)
[ setup: 0.022 s] ( 0.58%)
[ solve: 1.929 s] ( 50.32%)

Matrix portrait of A_hand

Matrix portrait of A_shoe

The text was updated successfully, but these errors were encountered:

ddemidov · 2024-01-20T05:50:59Z

Hello,

Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver.

I could not improve on the convergence results you show. You could try to use grid coordinates to generate near-nullspace vectors (see https://amgcl.readthedocs.io/en/latest/tutorial/Nullspace.html).

How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).

The CUDA backend does not support block-valued solution, but you could use VexCL backend with CUDA as VexCL backend. There is an example of using VexCL backend with block-valued systems here: https://amgcl.readthedocs.io/en/latest/tutorial/Serena.html

If i could get a good initial solution for result vector x, how to specify it in the code?

You should put it into the solution vector. Vector x is treated as both input (for initial approximation) and output (for the result).

haoxiangNtu · 2024-01-20T09:29:35Z

Hello Sir,

I have been studying and trying many options in AMGCL for the past few days, but still, there has been no progress. After using the nullspace, I found that the improvement in convergence is not significant. I'm not sure if this is because the C matrix I constructed is incorrect. My stiffness matrix is a general tetrahedral stiffness matrix, and I have removed the rows and columns corresponding to the fixed points. This means that I have also removed the coordinates of the fixed points in the C matrix (this matrix is also uploaded in the previous link), and I am not sure if this approach is correct. Additionally, I tried using ns_search to find approximate nullspace, but the results are still not satisfactory. Here are some simple results from my tests: currently, using the CG solver with the block option gives the best result, but it takes about 1.6 seconds. I have also implemented the VexCL backend with block-valued systems with the CUDA option, but it also takes about 0.62 seconds to compute. However, when I tested with the Intel MKL's Pardiso library, the results only took about 0.608 seconds. Do you have any further suggestions?

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.maxiter=1000 -b3

Solver

Type: BiCGStab
Unknowns: 16011
Memory footprint: 2.57 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 290
Error: 8.76707e-09

[Profile: 3.483 s] (100.00%)
[ reading: 1.579 s] ( 45.33%)
[ setup: 0.019 s] ( 0.54%)
[ solve: 1.885 s] ( 54.11%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3

Solver

Type: CG
Unknowns: 16011
Memory footprint: 1.47 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 513
Error: 9.83891e-09

[Profile: 3.301 s] (100.00%)
[ reading: 1.581 s] ( 47.89%)
[ setup: 0.025 s] ( 0.75%)
[ solve: 1.695 s] ( 51.34%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 890
Error: 9.74222e-09

[Profile: 5.619 s] (100.00%)
[ reading: 1.612 s] ( 28.69%)
[ setup: 0.080 s] ( 1.43%)
[ solve: 3.926 s] ( 69.87%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 471
Error: 9.75176e-09

[Profile: 5.453 s] (100.00%)
[ reading: 1.634 s] ( 29.96%)
[ setup: 0.097 s] ( 1.77%)
[ solve: 3.722 s] ( 68.26%)

./ns_search -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -n6 -o N6.mtx

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -N N6.mtx

Iterations: 235
Error: 9.75834e-09

[Profile: 3.683 s] (100.00%)
[ reading: 1.780 s] ( 48.32%)
[ setup: 0.101 s] ( 2.74%)
[ solve: 1.802 s] ( 48.93%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -N N6.mtx
Iterations: 459
Error: 9.82758e-09

[Profile: 3.762 s] (100.00%)
[ reading: 1.720 s] ( 45.73%)
[ setup: 0.076 s] ( 2.02%)
[ solve: 1.964 s] ( 52.21%)

VexCL backend with block-valued systems with the CUDA option:

NVIDIA GeForce RTX 2080 Ti

Matrix A_0 has 48033 rows and 48033 columns.
Matrix A_0 is rowmajor
Matrix A_0 has 48033
Matrix Serena.mtx: 48033x48033
Solver

Type: CG
Unknowns: 16011
Memory footprint: 1.47 M

Preconditioner

Number of levels: 3
Operator complexity: 1.13
Grid complexity: 1.08
Memory footprint: 13.96 M

level unknowns nonzeros memory

0        16011         177785     12.51 M (88.25%)
1         1206          22398      1.27 M (11.12%)
2           86           1272    175.07 K ( 0.63%)

Iters: 461
Error: 9.98164e-08

[Serena (VexCL): 2.598 s] (100.00%)
[ GPU matrix: 0.013 s] ( 0.48%)
[ read: 1.916 s] ( 73.75%)
[ setup: 0.047 s] ( 1.82%)
[ solve: 0.622 s] ( 23.93%)

ddemidov · 2024-01-20T15:53:33Z

It looks like the problem is just too difficult to solve with a simple iterative approach. Intel Pardiso is a direct solver, so it does not have this problem. Direct solvers usually do not scale as good as iterative ones, but if you do not need to solve larger systems, and are satisfied with the Pardiso performance, then may be the direct solver is your best approach here.

Otherwise, you probably need to research on the methods to solve your particular problem. I am sorry I could not be of more help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I do not know how to tune the amgcl to make solver converged #272

I do not know how to tune the amgcl to make solver converged #272

haoxiangNtu commented Jan 17, 2024

ddemidov commented Jan 20, 2024

haoxiangNtu commented Jan 20, 2024

ddemidov commented Jan 20, 2024

I do not know how to tune the amgcl to make solver converged #272

I do not know how to tune the amgcl to make solver converged #272

Comments

haoxiangNtu commented Jan 17, 2024

result of A_hand Matrix A has 176328 rows and 176328 columns. Matrix A is rowmajor Solver

Preconditioner

level unknowns nonzeros memory

result of A_Shoe Matrix A has 48033 rows and 48033 columns. Matrix A is rowmajor Solver

Preconditioner

level unknowns nonzeros memory

ddemidov commented Jan 20, 2024

haoxiangNtu commented Jan 20, 2024

Solver

Preconditioner

level unknowns nonzeros memory

Solver

Preconditioner

level unknowns nonzeros memory

Matrix A_0 has 48033 rows and 48033 columns. Matrix A_0 is rowmajor Matrix A_0 has 48033 Matrix Serena.mtx: 48033x48033 Solver

Preconditioner

level unknowns nonzeros memory

ddemidov commented Jan 20, 2024

result of A_hand
Matrix A has 176328 rows and 176328 columns.
Matrix A is rowmajor
Solver

result of A_Shoe
Matrix A has 48033 rows and 48033 columns.
Matrix A is rowmajor
Solver

Matrix A_0 has 48033 rows and 48033 columns.
Matrix A_0 is rowmajor
Matrix A_0 has 48033
Matrix Serena.mtx: 48033x48033
Solver