Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded rendering / rasterisation #8

Open
tamara-schmitz opened this issue Aug 15, 2017 · 5 comments
Open

Multithreaded rendering / rasterisation #8

tamara-schmitz opened this issue Aug 15, 2017 · 5 comments

Comments

@tamara-schmitz
Copy link
Owner

Essentially two sections can be multithreaded: vertex shading and rasterisation (including pixel shading).
Listed below there a few multithreading concept proposals:

Vertex shader:

  • put every matrix manipulation per triangle in a task queue and let thread pool process queue

Rasterisation:

  • split frame and z buffer into slices along the y axis (ideally twice / thrice as many slices as CPU processing cores). give each slice a task queue for storing index of to be processed triangles received from vertex stage (only one worker thread per slice). use bound checks to determine which triangles affect which lines (probably useless if just a few slices). copy vertices from triangle queue into edges and texcoordforedge. do rasterisation. copy each slice into main framebuffer.
@tamara-schmitz
Copy link
Owner Author

tamara-schmitz commented Sep 26, 2017

The following is already implemented since at least: 32085e5

### Queue fetches
Use SafeQueue to reduce complications. Main thread should notify every frame about how many triangles have been sent out for rendering. Threads can then decide whether they should use pop() in blocking mode or notifiy the main thread that they have finished doing their work.

tamara-schmitz pushed a commit that referenced this issue Sep 27, 2017
@tamara-schmitz
Copy link
Owner Author

tamara-schmitz commented Nov 3, 2017

Memory fencing

SafeQueues are in place but we use locks to prevent race conditions.
Read about memory fencing instead: https://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer?page=0,1

Circular buffers

Switching from a Queue to a circular buffer seems like a good idea as it guarantees that there are no reallocations during pop and push. Memory allocations are also unnecessary during runtime. However buffer size is pretty static. Buffer stalls if write pointer just in front of read pointer (=> buffer is full).
Check out Wikipedia for more information: https://en.wikipedia.org/wiki/Circular_buffer
Also this may be useful: https://www.codeproject.com/Articles/153898/Yet-another-implementation-of-a-lock-free-circul

Other ideas

Use of a stack which also eliminates reallocations but constant allocs and deallocs may degrade performance.

@tamara-schmitz
Copy link
Owner Author

Current status

Threading works pretty much (suspect race condition in VP if VP count > 1 though). See 016ed89

Performance results are bad as expected as currently every triangle fetch from the rasteriser requires a lock.

@tamara-schmitz
Copy link
Owner Author

Other possible improvements

Profiling is required but VertexProcessorObjs may slow things down as they all have shared contains pointing at one texture. Concurrent reference counting could have a significant influence on performance.

tamara-schmitz pushed a commit that referenced this issue Jan 26, 2022
@tamara-schmitz
Copy link
Owner Author

SafeQueue was rewritten to be the only queue type required. Only issues left are in copying rasteriser textures back to the main thread and rendering them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant