Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start Compute debugger after picking a UAV RenderTarget pixel #2065

Open
sammyfreg opened this issue Oct 1, 2020 · 13 comments
Open

Start Compute debugger after picking a UAV RenderTarget pixel #2065

sammyfreg opened this issue Oct 1, 2020 · 13 comments
Labels
Feature An improvement or feature Unresolved Waiting for a fix or implementation

Comments

@sammyfreg
Copy link

Description

RenderDoc has this nice essential feature of Picking a pixel from a RenderTarget and either displaying it's History or Debuging the Pixel Shader that wrote the value.

Nowadays, we are often using Compute shaders outputing values to a RenderTarget mapped as an UAV (Deferred Tiled Lighting for example). Unlike values written from a Pixel Shader, we cannot find the associated shader thread that wrote the value. For simple SV_DispatchThreadID.xy == Pixel.XY we can easily calculate the proper Group/Thread Id and launch the debugger with these values, while not as convenient it is manageable. However, add Dispatch Indirect to the mix (in our TileLighting example, because we grouped the tiles per Lighting Type) and it becomes very tedious to figure out which thread wrote to a pixel, making it rarely worth our time.

It would be nice if RenderDoc had the capability of finding the SV_DispatchThreadID that wrote to a particular picked pixel, and start the debuguer with it. There would be some things to figure out, like how to handle same pixels written to by many threads (maybe offer a list?).

Bonus Points
Offser similar capability with regular Buffer mapped as UAV. So we can pick 'entry index' or 'memory location' and start debugging the compute thread that wrote to it.

Environment

  • RenderDoc version: 1.9
  • Operating System: Windows
  • Graphics API: DX11, DX12...
@baldurk
Copy link
Owner

baldurk commented Oct 1, 2020

This can probably be implemented on Vulkan, since it would be a modification/reversal of the existing feature which gathers the resource access within array descriptors to limit the list of displayed resources to only those actually dynamically used by the shader.

I don't think it will be feasible on DX11 or DX12 though - I'm not clear with the "..." in the graphics API list if this would still be useful to you if it were Vulkan only?

@baldurk baldurk added Feature An improvement or feature Unresolved Waiting for a fix or implementation labels Oct 1, 2020
@sammyfreg
Copy link
Author

sammyfreg commented Oct 1, 2020

I had the '...' because I figured it would be a useful feature on all platform, but personaly, it would be more of a DX11/DX12 request. I believe that most console game developpers are relying on DirectX for PC development.

I thought it might not possible to do (or very difficult), wasn't sure of what you have access to, in your captures.

Here's an idea on the top of my head, not sure how feasible it would be, but maybe it could be patching the compute shader to early out if threadid != XY and test each thread index one by one (with new dispatch) until we see if output value was modified. This would definitly be slow, but only done on demand by user.

@baldurk
Copy link
Owner

baldurk commented Oct 1, 2020

Yes unfortunately I don't have any special access or backdoors. This is maybe feasible for IHVs or microsoft who can do extra things behind the scenes, but in renderdoc I don't have any more access than you have in the application. That also means that on D3D shader patching is not feasible.

Patching shaders is what I was thinking when I was saying I could implement this on Vulkan, since there it's very easy by comparison. I already do shader patching in vulkan to annotate shaders and determine which resources are used. This is very useful when programs bind very large arrays for "bindless" type access, where there are 10000 resources bound but only a handful are actually used. On vulkan I can determine which resources are accessed which makes the UI much clearer by only showing those resources, but unfortunately this isn't possible on DX.

It works by writing to a buffer on every resource access to record which resources are accessed. This could be modified relatively simply to instead check the access co-ordinate for a specific target resource and record when the selected pixel/buffer offset is modified. That's better than earlying out which might cause problems - e.g. some shaders do special work in threadid == 0,0 with a group barrier to set up groupshared before going wide on all threads. It also wouldn't catch no-op writes (e.g. writing 0 to a black texture).

@sammyfreg
Copy link
Author

sammyfreg commented Oct 1, 2020

Yeah, what you are describing is something that microsoft eventually added to PIX, I remember it was really helpfull with bindless resources. Their improved access to the GPU defnitly makes things easier for them to achieve these kind of things.

Maybe 'patching' wasn't the right term, more like adding a few D3D bytecode at start of the shader bin before binding them and drivers does the final compilation, if such a thing is possible. If not, maybe it could only be enabled when source file are available, recompiling the whole thing with the added early out. Just like it's currently done when user debug an edited shader.

As for your good point of not detecting a write of same value as current output, it could be that we initialize the buffer to 1-FinalValue. It won't detect thread actualyl writing '1-FinalValue' but at the very least, it would find the one that contributed to final value.

@baldurk
Copy link
Owner

baldurk commented Oct 1, 2020

I meant patching as an overall label for any modification of the D3D bytecode, which is not possible for regular programs. Only fxc is allowed to produce DXBC, so there's no way of knowing if any modifications we make would still be legal. DXIL is in theory modifiable by other programs but in practice dxc/dxil is such a complete mess there's no way anyone outside of microsoft can produce DXIL and be confident that they get something valid. That's ignoring how huge a task it is to emit dxil without linking all of dxc.

I think you would have to go through a long route of converting the D3D bytecode to some other format like SPIR-V or maybe directly to a high level language, patching it somewhere there and then converting into HLSL to be compiled again. However I don't think that would be a reliable route and could easily introduce false positives or errors.

Patching if shader source is available is in theory possible, but trying to patch any high level language that isn't in a known form is extremely difficult. Consider that many shaders make heavy use of macros, multiple source files, unusual syntax, etc. For this reason I also wouldn't consider this feasible on OpenGL even though there you are guaranteed to have source available because of GLSL.

@sammyfreg
Copy link
Author

I was thinking of the embedded hlsl code in DX, where all macro have been expended, but it's true that the code can also be from unprocessed source files. Haven't tried that case, so I don't know if you have access to the defines in the shader debug info in this situation, to pre process the file and then modify it's first line of code to early out.

In any case, outside of access to more features on the GPU or editing the source file, I can't think of other ways to achieve this. If you believe this is not reliably achievable, feel free to close this request.

Thank you for considering this feature and for your excellent tool.

@baldurk
Copy link
Owner

baldurk commented Oct 1, 2020

The pre-processor defines are usually included, however I have seen the compiler omit files that don't have any executable code which can lead to shaders which can't be recompiled again from source. Even so that would still require a preprocessor which performs precisely the same as fxc or dxc, which isn't easy in itself. Even after pre-processing it could still be complex to identify and annotate every resource access - as I mentioned above it's generally not possible to force every thread to run in isolation and still get sensible results.

One other option which might be better than nothing is to reverse the problem - run the debugger for every shader thread in the whole dispatch and then use that to pick the one that modifies the target pixel. This would likely be extremely slow but as you say at least it would be only in response to a user starting that operation so perhaps that's acceptable.

I'll leave the issue open since even if nothing else this could be implemented simply and very fast on Vulkan. Maybe in future you'll consider vulkan and then you can get this and many other nice features ;).

@sammyfreg
Copy link
Author

sammyfreg commented Oct 1, 2020

That last suggestion was what I meant. Run the shader once for each dispatch threads, making sure only one thread is active, by adding a bit of code at the start of the entry point, and note each time after 1 dispatch is done, if the desired output was modified by it.

No complicated changes to source code about how we write values, only disabling every thread except the one being tested in the dispatch loop.

@baldurk
Copy link
Owner

baldurk commented Oct 1, 2020

Right, the problem is that only works if all the threads work independently. Or at least independently enough that the same co-ordinates are accessed by the same threads even if the values are different. If you have a shader which e.g. loads a batch of work to do in thread 0 into groupshared memory which is then worked on by all threads, if you disable thread 0 then all the other threads might not do any work at all which would hide their results.

This is the same problem that the current compute shader debugging has because it only simulates one thread at once (much for the same reason as here - because simulating 1000 could be very slow).

@sammyfreg
Copy link
Author

Yes, I realized after going to bed, about the issue of potential cross thread dependency, like atomic shared values or cross-lane values exchange, that could affect if/where values are written. :/

Using lane 0 to load/store values that you mentioned is another common case that would break this technique.

@LeLocTai
Copy link

I think the SV_DispatchThreadID.xy == Pixel.XY use case is common enough that it worth making the currently grayed out Debug button work for that case.
For other cases, maybe letting user specify a Python function that do the mapping? The UI could be the same as the current custom visualization shader UI.

@LeLocTai
Copy link

I'm trying to implement this as an extension but it look like the picked location is not exposed. If you find the approach I mentioned less than ideal, would you mind exposing the picked location to extensions?

@baldurk
Copy link
Owner

baldurk commented May 31, 2022

Sure thing, I've added a python function to get the currently picked pixel. I'm happy to expose more things to the python API for extension access on request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature An improvement or feature Unresolved Waiting for a fix or implementation
Projects
None yet
Development

No branches or pull requests

3 participants