Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve matrix-free performance marginally. #5570

Open
bangerth opened this issue Feb 7, 2024 · 1 comment
Open

Improve matrix-free performance marginally. #5570

bangerth opened this issue Feb 7, 2024 · 1 comment

Comments

@bangerth
Copy link
Contributor

bangerth commented Feb 7, 2024

@tjhei This is one for you, pointed out to me by @marcfehling .

In stokes_matrix_free.cc, we apply the block preconditioner in the following way (abridged):

    void
    BlockSchurGMGPreconditioner<StokesMatrixType, ABlockMatrixType, SchurComplementMatrixType,
                                ABlockPreconditionerType, SchurComplementPreconditionerType>::
                                vmult (dealii::LinearAlgebra::distributed::BlockVector<double>       &dst,
                                       const dealii::LinearAlgebra::distributed::BlockVector<double>  &src) const
    {
      dst = 0.0;    // set all components of dst to zero

      // either solve with the Schur complement matrix (if do_solve_Schur_complement==true)
      // or just apply one preconditioner sweep (for the first few
      // iterations of our two-stage outer GMRES iteration)
      if (do_solve_Schur_complement)
        {
             ...
             dst.block(1) = S^{-1} src.block(1);
             ...
        }
      else
        {
          ...same as above, but only one preconditioner application instead of S^{-1}...
        }

      dst.block(1) *= -1.0;

      {
        // the u-block of dst only contains zeros
        stokes_matrix.vmult(utmp, dst); // B^T
        utmp.block(0) *= -1.0;
        utmp.block(0) += src.block(0);
      }

The key piece is that at the beginning of this last block, dst.block(0) is zero and so what ends up in utmp is really just the product of the top right and bottom right block of the Stokes matrix with dst.block(1). The top left and bottom left blocks of the matrix are multiplied by zeros.

The inefficiency now is that the vast majority of the work done in the stokes matrix (or stokes operator) is associated with the top left block, which is unnecessary here. It would be nice to either restrict operation to individual blocks, or to create a separate matrix-free object that only represents the top right and bottom right blocks.

@tjhei
Copy link
Member

tjhei commented Feb 8, 2024

Yes, that is correct. I was aware of it when Conrad implemented it, but forgot afterwards. This might make a noticeable improvement, who knows.

I'll let one of my students try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants