Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poisson solver #32

Merged
merged 26 commits into from Mar 7, 2024
Merged

Poisson solver #32

merged 26 commits into from Mar 7, 2024

Conversation

semi-h
Copy link
Member

@semi-h semi-h commented Feb 9, 2024

Poisson solver is the final part we need for running simulations with x3d2. The plan is to have an FFT based direct solver and use it whenever we can, and also have a CG based iterative solver. So I came up with this structure to have both methdos and decide which one to run at run time.

Few important considerations so that the structure makes more sense:

  • FFT based solver is implemented in a dedicated class/module
    • This will make it easy to switch to 2DECOMP&FFT later on
    • Current pencil arrangement is already compatible with 2DECOMP&FFT
  • FFT based poisson solver class is instantiated by the backend.
    • Different backends will use different libraries for running the FFT, I think this makes everything easy.
    • Alternative would be instantiating the poisson_fft class in solver, but I think it would make things uglier.
  • A matrix free CG based solver requires to evaluate a Laplacian operation.
    • I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.
    • So a CG solver will basically call a laplacian subroutine in solver class, and few other stuff provided by backends. This is why it doesn't require its own class like the fft based solver.
    • @pbartholomew08, I actually focused on a standard CG method, can you comment on a preconditioned CG method and how similar or different it is compared to a standard CG implementation?

Closes #55

@pbartholomew08
Copy link
Member

pbartholomew08 commented Feb 9, 2024

To get good performance (at scale) you need a preconditioner. The preconditioner is typically based on an actual matrix, e.g. a low order discretisation. I'm working on an implementation of this here 3decomp/poissbox#20, once I've got the preconditioner sorted I'll port it over to x3d2. The high-order Laplacian is working and a star-based preconditioner is implemented, but it's not currently effective.

I don't think we want to implement CG ourselves though!

Note that the "serial" aspect of the linked PR is only due to the TDMA implementation I'm using to keep things simple, the CG and AMG preconditioner are parallel, and the solver can be run using the low-order discretisation in parallel.

@pbartholomew08
Copy link
Member

* A matrix free CG based solver requires to evaluate a Laplacian operation.
  
  * I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

src/cuda/backend.f90 Outdated Show resolved Hide resolved
src/poisson_fft.f90 Outdated Show resolved Hide resolved
src/common.f90 Outdated Show resolved Hide resolved
@JamieJQuinn
Copy link
Collaborator

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

@semi-h
Copy link
Member Author

semi-h commented Feb 9, 2024

I don't think we want to implement CG ourselves though!

At the current stage the codebase is only able to pick between two dummy functions, poisson_fft or poisson_cg at runtime via the function pointers. I think poisson_cg can then issue more sophisticated calls to PETSc for example. If you think this looks good I'm happy to leave poisson_cg as is so that you can port your stuff when ready. My plan was to complete only the fft solver in this PR so that we can test the codebase in full, but just wanted to implement a way to pick between two methods we'll definitely have in the codebase soon.

@semi-h
Copy link
Member Author

semi-h commented Feb 9, 2024

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

@semi-h
Copy link
Member Author

semi-h commented Feb 9, 2024

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

Actually we can access the backend stuff from the solver, all methods in solver do it already. For example a naive CG implementation would only require a laplacian and vector squares I guess.
And technically we could add the FFT related stuff into the backends and not require seperate class as well, but I didn't do it for two reasons.

  • FFT based solver requires lots of parameters, settings, complex arrays..., it would make the backends complicated. Because it comes with bunch of complicated stuff that only relates to an FFT based approach, it made sense to me to separate it into its own class and keep the complexity away from backend. (all this is not in the PR yet)
  • If we switch to a third-party library (2DECOMP&FFT), then we obviously can't call things from our backend, and all those stuff would have to be removed in case of a switch, and also having a separate class now should help later on when switching I think.

But if we're doing the CG stuff via PETSc, then its very likely that we'll need a class to keep PETSc related complexity away from backends too.

@pbartholomew08
Copy link
Member

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

@semi-h semi-h force-pushed the poisson branch 2 times, most recently from ecdb773 to bcbaf18 Compare February 19, 2024 16:51
@semi-h
Copy link
Member Author

semi-h commented Feb 21, 2024

The only remaining part is the reorder functions operate on complex arrays. I think I can get them working tomorrow.

@semi-h semi-h force-pushed the poisson branch 2 times, most recently from adbc5ab to a251fa9 Compare February 22, 2024 17:31
@JamieJQuinn
Copy link
Collaborator

This fixes #55

@semi-h
Copy link
Member Author

semi-h commented Mar 1, 2024

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.
Also, the plans in the init function will be adopted to cuFFTMp, and the fft_forward and fft_backward functions will be simplified down to single line calls from cuFFTMp.

Only thing I want to add before we merge is a periodic call to curl and divergence functions to report enstrophy and divergence of velocity field as the simulation goes on. We need a simple inner product function in the backends to finalise that.

@JamieJQuinn
Copy link
Collaborator

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

@semi-h
Copy link
Member Author

semi-h commented Mar 5, 2024

A call for last round of review folks!

The input settings in the main file is arranged so that it produces the result I shared in the slack channel. Basically a TGV from 0 to 20 on a 256^3 grid.

The output subroutine is not very good but for the time being useful to see if there is anything going bad with the TGV, because it prints the enstrophy. I suppose this will evolve a lot into more useful output stuff later.

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

They won't be being called once the cuFFTMp stuff is in and then like any redundant function they'll need to be removed. However we might want to compare performance of the two approaches so for a short while both the current FFT solver and cuFFTMp may coexist too. I think it'll be more clear with the cuFFTMp PR so not going ahead with TODO comments for now.

@JamieJQuinn JamieJQuinn self-requested a review March 5, 2024 17:08
Copy link
Collaborator

@JamieJQuinn JamieJQuinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall excellent. Needs comments + possible name changes for clarity & consistency. Might want tests but this PR is already large. Should be an issue if we decide not to put tests in here.

src/cuda/kernels/complex.f90 Outdated Show resolved Hide resolved
src/cuda/kernels/complex.f90 Show resolved Hide resolved
src/cuda/kernels/complex.f90 Show resolved Hide resolved
src/cuda/kernels/complex.f90 Show resolved Hide resolved
src/cuda/kernels/complex.f90 Show resolved Hide resolved
src/poisson_fft.f90 Show resolved Hide resolved
src/poisson_fft.f90 Outdated Show resolved Hide resolved
src/poisson_fft.f90 Show resolved Hide resolved
src/poisson_fft.f90 Show resolved Hide resolved

end function init

subroutine fft_forward_cuda(self, f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have working tests for these before trying to implement cuFFTMP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@semi-h implement in this PR or create an issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, cuFFTMp provides 3D forward and inverse FFT transforms, and our functions here will only carry out the calls to cuFFTMp library so probably there is no value testing them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the forward/backward FFTs sure, no need to test individually. But we're currently not unit testing the Poisson solver at all. Since we have a working implementation, I think it would be easiest to implement a correct test of this working solver, then apply the same test to cuFFTMP.

If you don't already have one kicking around, the analytical test case I usually use for Poisson solve is solving $\nabla^2 p = -\sin(\pi x) \sin(\pi y)$ and comparing to analytical solution $p_{analytical}(x,y) = \sin (\pi x) \sin(\pi y)/(2\pi^2)$.

Copy link
Member

@pbartholomew08 pbartholomew08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments for consideration. Otherwise I agree with Jamie's points.

src/cuda/poisson_fft.f90 Show resolved Hide resolved
src/cuda/poisson_fft.f90 Show resolved Hide resolved
src/cuda/poisson_fft.f90 Show resolved Hide resolved
src/omp/poisson_fft.f90 Show resolved Hide resolved
src/solver.f90 Show resolved Hide resolved
@pbartholomew08
Copy link
Member

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

Adding for reference (I think we've discussed this in meetings) the direct evaluation of the Laplacian using high-order methods for 2nd derivatives on the pressure grid seems to work well (as a standalone solver).

Copy link
Collaborator

@Nanoseb Nanoseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Paul and Jamie's comments, just added a couple minor ones.

src/cuda/kernels/complex.f90 Show resolved Hide resolved
src/poisson_fft.f90 Show resolved Hide resolved
src/solver.f90 Show resolved Hide resolved
@semi-h
Copy link
Member Author

semi-h commented Mar 6, 2024

Last commit fixed most issues raised.

There are a few comments about the reshape and complex_reorder functions, I admit that their naming scheme is terrible and there is a lot to do make them better, but there is a very good chance that cuFFTMp will work fine and they'll all be removed from the codebase entirely very soon. Because we expect their lifespan to be very short, I don't want to invest time to think about a proper naming scheme, adding comments, change the way we set threads/blocks for calling them, or testing them and willing to merge the PR as is. If we decide to keep them longer for any reason I'll be happy to implement all the suggestions.

Copy link
Collaborator

@JamieJQuinn JamieJQuinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly done! Looks like there are a few minor things still needing addressed, and a proper test of the Poisson solver implemented.

src/solver.f90 Show resolved Hide resolved

end function init

subroutine fft_forward_cuda(self, f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the forward/backward FFTs sure, no need to test individually. But we're currently not unit testing the Poisson solver at all. Since we have a working implementation, I think it would be easiest to implement a correct test of this working solver, then apply the same test to cuFFTMP.

If you don't already have one kicking around, the analytical test case I usually use for Poisson solve is solving $\nabla^2 p = -\sin(\pi x) \sin(\pi y)$ and comparing to analytical solution $p_{analytical}(x,y) = \sin (\pi x) \sin(\pi y)/(2\pi^2)$.

@semi-h
Copy link
Member Author

semi-h commented Mar 7, 2024

Its a good idea to have a unit test for the Poisson solver, but I don't want to implement it here as the PR is already large. Opened an issue to deal with it later on #74.

Copy link
Collaborator

@JamieJQuinn JamieJQuinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm happy with the TGV showing the Poisson solver works at this stage.

@semi-h semi-h merged commit e916fe8 into xcompact3d:main Mar 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement FFT Poisson solver (CUDA)
5 participants