FastCollisionDetectionLib

C++ fast collision detection for uniform(and non-uniform)-distributed AABB particles using adaptive grid with implicit vectorization.

Non-Sparse Adaptive Grid

10 million dynamic particles AABB collision check per second against static grid of 8000 particles
1000x speedup against naive brute-force algorithm for 40k particles (static vs static), with uniform-distribution in range [0 - 1]).
- teapot-in-stadium problem is partially solved by "adaptive" grid:
- 290x speedup when half of AABBs are 10x further than each other [0-1] and [10-11]
- 230x speedup when half of AABBs are 10x far and a single AABB 100x far: [0-1] x N/2, [10-11] x N/2, [100-101] x1
Produced collision list does not contain duplicate pairs of collisions
Particle data is not touched, work done only on pointers internally
Currently it is adaptive, but needs optimizations on memory handling.
- On every cell-overflow, it stretches the cell to AABB of all particles and converts to a grid of 4x4x4 cells each with 4 capacity
Implementation of IParticle is an AABB (axis-aligned bounding box) model
- In user defined particle (box as example here), methods (getMinX/Y/Z and getMaxX/Y/Z) must return AABB corners of the underlying user-particle

Sparse - Linear - Adaptive Grid

More than 2000x speedup against naive brute-force, in single-thread for 10000 particles
60 FPS for 20000 particles and ~25000 collision pairs on 2.1GHz FX8150 single-thread
- 100+ FPS for new CPUs
Better performance stability compared to non-sparse version
Better SIMD support on all-pairs computation method using tiled-computing
Non-zero based object-id values supported (getId() method in IParticle interface)

// prepare memory pool
FastColDetLib::MemoryPool memPool;

// map grid to a volume of cube between corners of (0,0,0) and (10005,10005,10005)
FastColDetLib::AdaptiveGridV2 grid(memPool,0,0,0,10005,10005,10005);

// implement IParticle<float>
struct AABBofPointCloud: public FastColDetLib::IParticle<float>
{
   ...
   	const CoordType getMaxX()const {return xmax;}
	const CoordType getMaxY()const {return ymax;}
	const CoordType getMaxZ()const {return zmax;}
	const CoordType getMinX()const {return xmin;}
	const CoordType getMinY()const {return ymin;}
	const CoordType getMinZ()const {return zmin;}
	const int getId()const {return id;}
    ...
};

// initialize AABB vector
std::vector<AABBofPointCloud> AABBs;

while(simulation)
{
    // clear tree data
    grid.clear();
    
    // add particles that implement IParticle<float> into grid
    grid.addParticles(N,AABBs.data());
    
    // build tree
    grid.buildTree();
    
    // compute all-pairs collision array 
    // 60FPS on FX8150 2.1GHz single-thread for 20000 particles with less than 29000 collisions
    // 100FPS on Xeon Gold 5215 2.9GHz single-thread for 20000 particles with less than 29000 collisions
    std::vector<std::pair<int,int>> vec = grid2_0.findCollisionsAll();
    
    // the vec contains id-values of particles that have their AABBs collide so that you can do further fine-grained collision checks between them
}

Multithreaded Tree of Sparse - Linear - Adaptive Grid

60 FPS for 40000 particles' AABB all-pair computations
Bottlenecked by RAM bandwidth and mutex-array locking throughput
Only zero-based object-id values supported
Work load is balanced on particles, not volumes, this makes better distribution of particles on threads but causes duplicated work due to merging of results from all leaf nodes

    // 7 threads with load-balancing by a tree, mapped to (0,0,0)-(10005,10005,10005) region
    AdaptiveGridTree<7> test(0,0,0,10005,10005,10005);

    for(int i=0;i<100;i++)
    {
		size_t nano;
		{
			FastColDetLib::Bench bench(&nano);
			
			// clear contents of memory pool of adaptive grid tree
			test.clear();
			
			// AABBs is a vector of objects that implements IParticle<float> interface 
	                // (only zero-based object-id values supported from getId() method of IParticle<float>)
			test.addParticles(N,AABBs.data()); 
			
			// non-duplicate pairs of collisions 
			const auto coll = test.computeAllPairs();
			std::cout<<"c="<<coll.size()<<std::endl;
		}
		std::cout<<"t="<<nano<<std::endl;
    }

For details, please visit wiki page.

Working demo for non-sparse (old version) adaptive grid (requires linking pthread for header and gomp/fopenmp for this demo):

#include"FastCollisionDetectionLib.h"

// user-object, with IParticle<coordinate_type> interface for querying AABB within the API
class Box: public FastColDetLib::IParticle<float>
{
public:
	Box(float xPrm=0, float yPrm=0, float zPrm=0, float x2Prm=0, float y2Prm=0, float z2Prm=0, size_t idPrm=0)
	{
		x=xPrm;
		y=yPrm;
		z=zPrm;

		x2=x2Prm;
		y2=y2Prm;
		z2=z2Prm;
		id=idPrm;
	}

	const float getMaxX() const { return x>=x2?x:x2;}
	const float getMaxY() const { return y>=y2?y:y2;}
	const float getMaxZ() const { return z>=z2?z:z2;}

	const float getMinX() const { return x>=x2?x2:x;}
	const float getMinY() const { return y>=y2?y2:y;}
	const float getMinZ() const { return z>=z2?z2:z;}
	const int getId() const {return id;}

	~Box(){}

private:
	float x,y,z,x2,y2,z2;
	size_t id;
};

#include<iostream>
#include<omp.h>
#include"Generator.h"
int main()
{

	oofrng::Generator<64> gen;

	{
		// d^3 number of particles
		const int d = 20;
		const int n=d*d*d;
		std::cout<<"n="<<n<<std::endl;
		std::vector<Box> box(n);
		for(int i=0;i<n;i++)
		{
			auto x1 = gen.generate1Float()*d;
			auto y1 = gen.generate1Float()*d;
			auto z1 = gen.generate1Float()*d;

			float dx = 0.05f;
			float dx2 = 0.05f;
			box[i]=Box(x1,y1,z1,x1+dx2+dx*gen.generate1Float(),y1+dx2+dx*gen.generate1Float(),z1+dx2+dx*gen.generate1Float(),i);
		}

		// thread-pool releases itself once it is out of scope
		// single instance of thread pool can be used for multiple adaptive grids
		FastColDetLib::ThreadPool<float> thr;
		FastColDetLib::AdaptiveGrid<float> grid(thr,-1,-1,-1,d+1,d+1,d+1);
		FastColDetLib::BruteForce<float> bruteForce;
		std::vector<FastColDetLib::CollisionPair<float>> coll3D,coll3Dbrute;


		std::cout<<"add"<<std::endl;
		bruteForce.add(&box[0],n);
		std::cout<<"compute"<<std::endl;
		bool bfor = true;


		for(int k=0;k<40;k++)
		{
			size_t nano1,nano2;

			{
				{
					FastColDetLib::Bench bench(&nano1);
					{
						FastColDetLib::Bench bench(&nano1);
						grid.clear();
					}
					std::cout<<"grid-static-object-clear: "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						grid.add(box.data(),n);
					}
					std::cout<<"grid-static-object-add ("<<n<<" particles AABB): "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						coll3D = grid.getCollisions();
					}
					std::cout<<"grid-static-object-compute: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				}
				std::cout<<"grid-static-total: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				std::vector<float> rand(1000000*3);
				gen.generate(rand.data(),1000000*3);
				{
					std::mutex mut;
					std::vector<FastColDetLib::IParticle<float>*> res;
					{
						FastColDetLib::Bench bench(&nano1);
						#pragma omp parallel for
						for(int j=0;j<100;j++)
						{
							std::vector<FastColDetLib::IParticle<float>*> resTmp;
							for(int i=0;i<1000;i++)
							{
								auto x = rand[i*3]*d;
								auto y = rand[i*3+1]*d;
								auto z = rand[i*3+2]*d;
								auto item = Box(x,y,z,x+0.25,y+0.25,z+0.25);
								auto collisions = grid.getDynamicCollisionListFor(&item);
								std::copy(collisions.begin(),collisions.end(),std::back_inserter(resTmp));
							}
							std::lock_guard<std::mutex> lg(mut);
							std::copy(resTmp.begin(),resTmp.end(),std::back_inserter(res));
						}

					}
					std::cout<<"grid-compute-dynamic (100k particles AABB): "<<nano1/1000000000.0<<" s "<<res.size()<<std::endl;
				}

			}


			if(bfor)
			{
				FastColDetLib::Bench bench(&nano2);
				coll3Dbrute = bruteForce.getCollisions();
			}
			if(bfor)
				std::cout<<"Brute-force ("<<n<<" particles AABB): "<<nano2/1000000000.0<<" s "<<coll3Dbrute.size()<<std::endl;
			std::cout<<"------------------------------------------------------------------------------------"<<std::endl;
		}


		if(bfor)
		{
			if(coll3D.size() != coll3Dbrute.size())
			{
				std::cout<<"ERROR: not equal sizes of collision lists"<<std::endl;
				std::cout<<coll3D.size()<<" vs "<< coll3Dbrute.size()<<std::endl;
				const int sz = std::min(coll3D.size(),coll3Dbrute.size());
				const int sz2 = sz<10?sz:10;
				for(int i=0;i<sz2;i++)
				{
					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
			else
			{
				const size_t sz = coll3D.size();
				for(size_t i=0;i<sz;i++)
				{

					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
		}
		std::cout<<"--"<<std::endl;


	}
	return 0;
}

output for FX8150 at 2.1GHz:

n=8000
add
compute
grid-static-object-clear: 6.4614e-05 s 
grid-static-object-add (8000 particles AABB): 0.00656692 s 
grid-static-object-compute: 0.00195991 s 16
grid-static-total: 0.00866529 s 16
grid-compute-dynamic (100k particles AABB): 0.0137853 s 3600
Brute-force (8000 particles AABB): 0.541069 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00143487 s 
grid-static-object-add (8000 particles AABB): 0.00523593 s 
grid-static-object-compute: 0.00182263 s 16
grid-static-total: 0.00853913 s 16
grid-compute-dynamic (100k particles AABB): 0.00835323 s 4300
Brute-force (8000 particles AABB): 0.541662 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00161331 s 
grid-static-object-add (8000 particles AABB): 0.00521717 s 
grid-static-object-compute: 0.00180544 s 16
grid-static-total: 0.0086723 s 16
grid-compute-dynamic (100k particles AABB): 0.00864333 s 3200
Brute-force (8000 particles AABB): 0.54128 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00151348 s 
grid-static-object-add (8000 particles AABB): 0.00524639 s 
grid-static-object-compute: 0.00180144 s 16
grid-static-total: 0.00860804 s 16
grid-compute-dynamic (100k particles AABB): 0.010513 s 3200
Brute-force (8000 particles AABB): 0.541246 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00160501 s 
grid-static-object-add (8000 particles AABB): 0.00521408 s 
grid-static-object-compute: 0.00176137 s 16
grid-static-total: 0.00862669 s 16
grid-compute-dynamic (100k particles AABB): 0.0120599 s 3000
Brute-force (8000 particles AABB): 0.54391 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00174257 s 
grid-static-object-add (8000 particles AABB): 0.00548767 s 
grid-static-object-compute: 0.00182023 s 16
grid-static-total: 0.00909577 s 16
grid-compute-dynamic (100k particles AABB): 0.0083235 s 3500
Brute-force (8000 particles AABB): 0.54521 s 16
------------------------------------------------------------------------------------

This is ~8500x performance for checking a particle collision against a grid of static objects and ~50x performance for static-static collision check (or dynamic-dynamic).

More particles:

n=64000
add
compute
grid-static-object-clear: 1.3058e-05 s 
grid-static-object-add (64000 particles AABB): 0.0347275 s 
grid-static-object-compute: 0.0106954 s 114
grid-static-total: 0.0455527 s 114
grid-compute-dynamic (100k particles AABB): 0.0154652 s 3700
Brute-force (64000 particles AABB): 35.473 s 114

777x performance for static vs static collision checking, more than 2000x performance for static vs dynamic.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
AdaptiveGridTree.h		AdaptiveGridTree.h
FastCollisionDetectionLib.h		FastCollisionDetectionLib.h
Generator.h		Generator.h
LICENSE		LICENSE
README.md		README.md
test1-no-grid-random-distribution.txt		test1-no-grid-random-distribution.txt
test2-unoptimized-grid.txt		test2-unoptimized-grid.txt
test3-optimized-grid-1.txt		test3-optimized-grid-1.txt
test4-optimized-grid-multithreaded-collision-test.txt		test4-optimized-grid-multithreaded-collision-test.txt
test5-teapot-in-stadium-problem-unoptimized.txt		test5-teapot-in-stadium-problem-unoptimized.txt
test6-teapot-in-stadium-problem-library-solution.txt		test6-teapot-in-stadium-problem-library-solution.txt
test7-teapot-in-stadium-library-allpairs.txt		test7-teapot-in-stadium-library-allpairs.txt
test8-simd-brute-force-20k-particles.txt		test8-simd-brute-force-20k-particles.txt
test9-sparse-linear-adaptive-grid-20k-particles.txt		test9-sparse-linear-adaptive-grid-20k-particles.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaptiveGridTree.h

AdaptiveGridTree.h