Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert to pure SoA particle containers #515

Open
BenWibking opened this issue Jan 31, 2024 · 3 comments
Open

convert to pure SoA particle containers #515

BenWibking opened this issue Jan 31, 2024 · 3 comments
Labels
enhancement New feature or request particles

Comments

@BenWibking
Copy link
Collaborator

BenWibking commented Jan 31, 2024

Describe the proposal
For performance reasons, we should convert the CICParticles to "pure" SoA particles, where all of the particle data is stored in memory in "structure of arrays" layout. This improves performance on GPUs, with no measurable effect on CPUs, for unclear reasons on all platforms, due to this data layout allowing for vectorization (on CPU) and for memory coalesing (on GPU).

Describe alternatives you've considered
We could keep it as is, which does not achieve high performance (compared to pure SoA). This might be okay, since we are probably not dominated by the cost of particle operations.

Additional context
WarpX has done so here: ECP-WarpX/WarpX#4653

@BenWibking BenWibking added enhancement New feature or request particles labels Jan 31, 2024
@BenWibking
Copy link
Collaborator Author

SoA performance on GPUs is 1.73x to 2.25x faster than the default (AoS) particle layout for PIC codes.

See performance benchmarks: ECP-WarpX/impactx#348.

@ax3l
Copy link

ax3l commented Feb 10, 2024

To give more details:

This improves performance on GPUs, with no measurable effect on CPUs

Generally this can improve performance on both CPUs and GPUs (see: ImpactX link), because of better aligned memory access for positions and IDs and of memory bandwidth savings when the id+cpu are not accessed.

I write can, because for some kernels (as seen in the WarpX PR) that are very register heavy (have low occupancy) or are bottlenecked by other parts, e.g., atomics, this does not show an immediate improvement on its own.

We also see performance improvements on CPU (see: ImpactX Drift vs. Quad), but notably there is one other effort to be aware of: CPU performance these days is mostly vectorization. Using an SoA layout is a prerequisite for easier autovectorization and/or manual vectorization (the first step with the old AoS layout was packing into SIMD vectors).
Easy functions now auto-vectorize with your compiler, more complex ones will be easier to vectorize manually/semi-manually.

So all in all: there is no downside transitioning to pure SoA layout.

Other things to consider

@BenWibking
Copy link
Collaborator Author

@ax3l Is there a SoA equivalent of amrex::ParticleInterpolator?

We use it here:

amrex::ParallelFor(np, [=] AMREX_GPU_DEVICE(int64_t idx) {

We could copy and paste the implementation we are using and rewrite for SoA particle tiles, but ideally that would be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request particles
Projects
None yet
Development

No branches or pull requests

2 participants