Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans on releasing builds? #1

Open
32bitx64bit opened this issue Mar 11, 2024 · 1 comment
Open

Plans on releasing builds? #1

32bitx64bit opened this issue Mar 11, 2024 · 1 comment

Comments

@32bitx64bit
Copy link

This looks cool, a GPU based powder toy would make it run so much faster.

@tugrul512bit
Copy link
Owner

It will take a lot of time to converge sand behaviors in parallel because original PowderToy has sands directly depend on each other which makes conversion a bit hard. Perhaps there could be an approximation equation and it could be optimized by artificial intelligence, maybe.

GPU has so much more performance that it could become 3D too but its not fully visible (as screens are 2D) so a lot of compute budget will be available per frame. But communication bandwidth is very limited (PCIE 16-32 GB/s). Kernels should move minimal amount of data per frame and re-compute things if necessary.

We need some kind of interaction kernel that is both parallel and can take any logic without depending on serial operations yet upgradable by new particle types and effects.

Possibly, instead of using cells of a matrix representing universe, direct particle-based (similar to physics-simulations) would make a lot of speedup but could diverge from the original behavior of sand. Simplest approach should be exclusion-principle (two particles can't fill same volume and push each other outwards). So instead of pushing themselves towards other cells, they could just freely move each other in space with non-integer coordinates.

On the other hand, cell-based integer-coordinate approach requires a lot of data copying due to checking many patterns of sands, etc that create all the rules of automata. Cellular automata is also easier to use for chemical reactions while other version would require explicit atomic-interactions like bonding between atoms, etc that require extra care to keep the simulation non-exploding (because it's easy to make simulation have exponentially gain more energy due to non-optimal time-step). But cell-based (integer-coordinates) approach does not have this explosion problem (unless explicitly added).

When using single-GPU, there is abundant bandwidth like 300-500 GB/s on mainstream cards and 1-2 TB/s on high-end cards. But on multiple-GPU, the communication is through pcie (because using OpenCL for not having proprietaryship-dependency) and slow while compute power is doubled or tripled compared to single card. Not all algorithms are offloadable efficiently on two cards. For example, atomic functions could work fast within same GPU but when there is another GPU, atomically accessing is orders of magnitude slower (if not impossible) and require extra driver/hardware dependency. So, algorithms should be minimally bandwidth-dependent while having maximum computational load (like re-computing same thing on both GPUs instead of copying to each other, sometimes this is faster). This will need a lot of benchmarking for each kernel developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants