Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hal/vk] Rework Submission and Surface Synchronization #5681

Open
wants to merge 14 commits into
base: trunk
Choose a base branch
from

Conversation

cwfitzgerald
Copy link
Member

@cwfitzgerald cwfitzgerald commented May 8, 2024

Connections

Description

As described in #5559, our submission and surface synchronization was totally messed up. I've revamped it, made smarter classes to help keep all the bookkeeping straight, and generally make the code cleaner and easier to understand.

Testing

works on my machine lel

@JMS55
Copy link
Contributor

JMS55 commented May 9, 2024

Works fine for me running the cube example with VK validation layers enabled

[2024-05-09T06:03:33Z INFO  wgpu_core::instance] Adapter Vulkan AdapterInfo { name: "Intel(R) Arc(TM) Graphics", vendor: 32902, device: 32085, device_type: IntegratedGpu, driver: "Intel Corporation", driver_info: "101.5382", backend: Vulkan }

@Vecvec
Copy link
Contributor

Vecvec commented May 9, 2024

This fixes an annoying access violation termination (that I thought was a driver bug) on program exit after a present.

@hecrj
Copy link
Contributor

hecrj commented May 9, 2024

Unfortunately, it does not fix #5644 for me. Same constant validation error on presentation:

[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkAcquireNextImageKHR-semaphore-01779 (0x5717e75b)]
        Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xba7514000000002a, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)
[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance]         objects: (type: SEMAPHORE, hndl: 0xba7514000000002a, name: ?)

The actual validation error only happens with a debug build, although release feels much slower than 0.19 as well. I believe something is very slow and the errors actually only show up in debug mode.

Flamegraph of cube seems to indicate this—with present taking a huge amount of time:

flamegraph

For comparison, same drivers with 0.19:

flamegraph

Even if I enable Immediate present mode, frames seem to take twice as much time to be rendered with 0.20.

@cwfitzgerald
Copy link
Member Author

@hecrj could you try again with these latest commits, someone else who can reproduce the validation error says that this fixed it.

@cwfitzgerald cwfitzgerald force-pushed the vk/fix-presentation-synchronization branch from ea6e6ba to 329513b Compare May 11, 2024 04:25
@hecrj
Copy link
Contributor

hecrj commented May 11, 2024

@cwfitzgerald Yep! The latest changes fix the validation errors. Thanks!

I'm still noticing a bit of worse performance in debug mode, but it doesn't seem to be there in release mode anymore. Flamegraphs look the same as before, but I imagine it's expected since now synchronization works properly and there may be more waiting. No errors in any case!

wgpu-hal/src/lib.rs Outdated Show resolved Hide resolved
@jimblandy
Copy link
Member

I'm actually rewriting these comments as I review so if you're not attached to them then don't worry too much about fixing anything.

@cwfitzgerald
Copy link
Member Author

Yeah that's fine - I was just trying to get something there.

wgpu-hal/src/lib.rs Outdated Show resolved Hide resolved
@jimblandy
Copy link
Member

jimblandy commented May 17, 2024

@cwfitzgerald What do you think of this? da543fb51

My hope was that this would make the states of RelaySemaphores a bit clearer to a new reader. If people should not use wait unless should_wait is true, then let's actually force people to check that condition before they can even get at the value: classic Option. But it does make advance fallible and require us to pass the device.

[edit: made this a little nicer still: RelaySemaphoreState is no longer necessary]

@jimblandy
Copy link
Member

@cwfitzgerald Here's an even more radical suggestion: b5d52fe

Just use a single semaphore for all queue ordering.

@cwfitzgerald
Copy link
Member Author

@jimblandy i think we can go for it, the only question is if we want to still avoid this deadlock in anv mentioned in the old code, it if we consider that old enough to ignore.

/// It would be correct to use a single semaphore there, but 
/// [Intel hangs in `anv_queue_finish`](https://gitlab.freedesktop.org/mesa/mesa/-/issues/5508).

I should have perseved that comment somehow.

@jimblandy
Copy link
Member

@cwfitzgerald:

I should have perseved that comment somehow.

Ahh, yes, I was reviewing by checking out the branch and just wandering around with rust-analyzer, so I didn't notice that the patch removed the comment. If that's the sole rationale for the entire semaphore-swapping doohickey then we had definitely better have the comment.

Kvark filed that bug in Oct 2021. That's not that long ago, so I think we'd better keep it. It sounds like a bear to debug, and I'd rather not have to debug it again.

@jimblandy
Copy link
Member

So I guess the remaining question is what you think of this one.

@jimblandy
Copy link
Member

@Wumpf Wumpf mentioned this pull request May 18, 2024
19 tasks
@cwfitzgerald
Copy link
Member Author

Looks fine to me, could you either PR to my fork or just push to it?

@jimblandy
Copy link
Member

I still want to get a handle on the swapchain semaphores. I've got some work-in-progress docs for those too. I'll take care of this tomorrow morning.

for &surface_texture in surface_textures {
wait_stage_masks.push(vk::PipelineStageFlags::TOP_OF_PIPE);
wait_semaphores.push(surface_texture.wait_semaphore);
// We lock access to all of the semaphores. This may block if two submissions are in flight at the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my understanding, but I think this can't actually block. All submissions using any given SurfaceTexture are required to use the same Fence, and as explained in the docs for wgpu_hal::Queue::submit, all submissions using the same Fence must be synchronized to occur in a particular order.

Comment on lines 1025 to 1019
// Wait for the previously acquired image used by the semaphore to be available.
swapchain.device.wait_for_fence(
fence,
locked_swapchain_semaphores.previously_used_submission_index,
timeout_ns,
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this actually accomplish? Aren't we going to wait for the acquire semaphore anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must wait until the cpu timeline is sure that the semaphore has been signaled and reset. Otherwise there will be two acquire operations that could signal the same semaphore.

Copy link
Member

@jimblandy jimblandy May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in other words:

If we were to remove this wait, then acquisition, command submission, and presentation would all set stuff in motion in the presentation engine or on the GPU, but as far as the CPU is concerned, they'd all return without blocking, so the CPU could just run the process full speed ahead, blow through the entire swapchain, come around to the front of the ring buffer again, and make a fresh call to vkAcquireNextImage passing the same semaphore as last time. This wait is the only thing that rate-limits that process to match actual presentation speed.

Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed another commit documenting this. If my explanation is correct, please mark this as resolved.

@jimblandy jimblandy force-pushed the vk/fix-presentation-synchronization branch from 6d2daa8 to c530f01 Compare May 21, 2024 01:05
@jimblandy
Copy link
Member

@cwfitzgerald I just pushed two doc commits. Could you look them over for accuracy?

Comment on lines +1063 to +1080
// debug_assert_eq!(
// Arc::as_ptr(&texture.surface_semaphores),
// Arc::as_ptr(&ssc.surface_semaphores[ssc.next_semaphore_index]),
// "Trying to use a surface texture that does not belong to the current swapchain."
// );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should either be uncommented or removed, but we shouldn't leave it in limbo like this.

@jimblandy jimblandy force-pushed the vk/fix-presentation-synchronization branch from a26aba1 to a5a000c Compare May 23, 2024 22:10
@jimblandy
Copy link
Member

rebased on current trunk

@jimblandy jimblandy force-pushed the vk/fix-presentation-synchronization branch from a5a000c to 4cf108c Compare May 23, 2024 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: needs back-porting PR with a fix that needs to land on crates
Projects
None yet
6 participants