CUDA-PT

Unidirectional Path Tracing implemented in CUDA, together with C++17 traits and is templated whenever possible.

This will definitely be benchmarked with AdaPT and, well CPU based renders like pbrt-v3 (generic accelerators) and tungsten (Intel Embree).

Since I have no intention making this a extensive project (like AdaPT, taking care of all the user-friendly aspect) and I am doing this just to challenge myself for more difficult parallel program design, this repo will not be so user friendly and the scalability will be far worse than that of AdaPT. I will try to keep the chores minimal and focus on heterogeneous program design.

Toy CUDA depth renderer with profiling:
Unidirectional path tracing with AABB culling. Full traversal without spatial partition. In this stage, shared memory and constant memory will be made use of. Special kind of variant will be of use (since std::variant is not supported by CUDA, for std::visit will either crash or be rejected by the compiler). This version of UDPT can be 3-8x faster than my AdaPT renderer (Taichi lang, JIT CUDA backend).

Depth Renderer	Unidirection PT

CUDA texture bindings (with normal or UV maps)
GPU side BVH implementation. This will be the most difficult part, since "it is always easy to write your program with parallelism, but difficult to make it fast".

shared / constant / texture memory acceleration

For naive full-traversal based implementation, multiple threads can batch geometries (together with UV-coords, normals, etc.) and copy them to shared memory. Since shared memory is limited (on my device, 49152 Bytes) and we need appropriate number of blocks to occupy the stream processors, batch size should be experimented with.
constant memory can be used to store object information (color, emission), since it will not occupy too much memory (65536 Bytes constant memory on my device is obviously sufficient).
texture memory: I have never tried this before. Excited! CUDA texture bindings offer hardware BILERP, amazing.

warp level operations & stream multi-processing

For now, not applicable it seems.

`variant` based polymorphism

Polymorphism can be easily achieved with virtual functions/classes, yet I don't think this is a good choice for GPU programming: extra vptr will

Add another global memory access, which can be slow (without possibility to coalesce memory access for fewer memory transactions)
Prevent compiler from inlining the function, and the stack procedures for calling a non-inline function can introduce overhead.

Polymorphism based on variant (union like type) might avoid the above overhead.

Spatial partition

For a scene with complex geometries, BVH (or KD-tree) should be implemented to accelerate ray-intersection. For CPUs, these acceleration structures are easy to implement and can be fast naturally, while for GPUs, branching efficiency and memory access pattern should be carefully considered, in order to run as fast as it can.

Current State

This repo originated from: w3ntao/smallpt-megakernel. I answered his question on stackexchange computer graphics and tweaked his code, so I thought to myself... why not base on this repo and try to make it better (though, I won't call it small-pt, since it definitely won't be small after I heavily optimize the code). After solving the problems in his code, I am able to render around 20x faster than CPU (don't remember how many threads I used, GPU is RTX TITAN, though):

For detailed analysis, please refer to my answer post given in the above link.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.vscode		.vscode
app		app
assets		assets
scene		scene
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
nvcc_test.cu		nvcc_test.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

app

app

assets

assets

scene

scene

src

src

test

test

.gitignore

.gitignore

.gitmodules

.gitmodules

CMakeLists.txt

CMakeLists.txt

README.md

README.md

nvcc_test.cu

nvcc_test.cu

Repository files navigation

CUDA-PT

shared / constant / texture memory acceleration

warp level operations & stream multi-processing

`variant` based polymorphism

Spatial partition

Current State

About

Releases 1

Packages

Languages

Enigmatisms/cuda-pt

Folders and files

Latest commit

History

Repository files navigation

CUDA-PT

shared / constant / texture memory acceleration

warp level operations & stream multi-processing

variant based polymorphism

Spatial partition

Current State

About

Topics

Resources

Stars

Watchers

Forks

Languages

`variant` based polymorphism