Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pixel History columns for dual source output and blend src/dest on Vulkan #3003

Closed
wants to merge 18 commits into from

Conversation

w-pearson
Copy link
Contributor

Description

This PR adds new columns to the pixel history view for the dual source output on shaders, and also for the blend source and destination factors. It also allows enabling the pre-mod value as an extra column, since there's a chance it could be useful. All of these extra columns are hidden by default, and can be accessed by right-clicking anywhere in the pixel history view and then editing the columns.

It also fixes a few minor bugs and typos with pixel history, including the buffer not being large enough if you have a large quantity of primitives hitting the same pixel within a single drawcall. Now, up to 255 are handled correctly (at which point the scissor-based approach stops letting us distinguish pixels), and more than 255 will show the first 255 without crashing.

I also added demos and automated tests for these features.

This PR only implements it for Vulkan. On other APIs, the columns will show as "Unavailable" (it probably would be possible to implement it, but I imagine trying to modify shaders textually for OpenGL would be a massive pain, and I don't know enough about D3D to do it there). Additionally, if blending is disabled or if dual source blend is disabled, the relevant data will not be collected and the columns will show as "Unavailable" (so this change should not have any impact on performance if blending or dual source blend is not in use).

Resolves #2461.

Currently, no backend sets this, and it is not displayed.
Currently, no backend sets them, and they are not displayed.
Before, if a row was selected, the color was completely replaced, which
makes the color preview column completely useless for that row.

This change also fixes some inconsistencies with how the selection hover
works on tree widgets; previously part of the row was not highlighted.
This is already implemented by all backends, but did not get displayed
beforehand. This column is not visible by default.
This has not yet been implemented by any backends, not is the column
visible by default.
They have not yet been implemented by any backends, not are the columns
visible by default.
The first one is a simply a typo.

The second one seems to be an unnecessary value that doesn't make sense
(as blend factors are ignored when blendEnable is false, and setting the
source color blend factor to the destination color seems like an
uncommon situation (the equivalent to blendEnable being false is
source = 1, dest = 0). This does not seem to be a driver bug, as
the commit that added it (5043898) does
not mention that nor does it seem to work around other driver bugs.

The third one is that VulkanDebugManager::PixelHistoryCopyPixel was
declared but never defined. VulkanPixelHistoryCallback::CopyImagePixel
serves the same purpose.
Before, the buffer could be overrun, which could result in anything from
garbage data to the GPU device being lost to segfaults. Now, correct
data is gathered.

We can't know in advance how many primitives will hit the targetted
pixel, so it's not possible to create the buffer until after our first
pass for each events (fortunately we do know the number of events in
advance). It is possible that we don't need a larger buffer, though,
in which case the original one can be re-used.
Copy link
Contributor Author

@w-pearson w-pearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I'm a bit unsure about for this current implementation (though there are probably other issues as this is a fairly large PR):

Comment on lines +3217 to +3227
// TODO: is this useful?
if(depthImage != VK_NULL_HANDLE)
{
VkCopyPixelParams depthCopyParams = colourCopyParams;
depthCopyParams.srcImage = depthImage;
depthCopyParams.srcImageLayout = depthLayout;
depthCopyParams.srcImageFormat = depthFormat;
CopyImagePixel(cmd, depthCopyParams, (fragsProcessed + f) * sizeof(PerFragmentInfo) +
offset +
offsetof(struct PixelHistoryValue, depth));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I collect depth information for all of these columns by reading the depth value each time. This seems kinda redundant, but the alternative would be to not use PixelHistoryValue/ModificationValue for this purpose, which seems equally messy (especially since ModificationValue is what has IsValid(), not PixelValue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find any information on how python files should be formatted, and clang-format doesn't seem to care. If there's anything specific, let me know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no particular autoformatting for python, clang-format does not support python. As long as it matches what is there roughly that's fine.

modif_both_1 = modifs_both[0]
modif_both_2 = modifs_both[1]
# modif_both_1 should match modif_1 as both are the same triangle on the same background
assert(value_selector(modif_1.preMod.col) == value_selector(modif_both_1.preMod.col))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using assert here because it seems like it would be a lot easier to use it than to do a check and then throw a rdtest.TestFailureException with a meaningful message. If needed I can change it, but I fear it would make things more confusing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use self.check which will perform similarly to a check but still raise a rdtest.TestFailureException which is then properly handled. It's preferred to do an if and raise an exception though wherever possible because then you can format the message to display what the actual values are which is helpful for debugging from a log.

@baldurk
Copy link
Owner

baldurk commented Jul 31, 2023

I'm not going to give this a detailed review as the PR itself is too large for the moment - see the contributing guidelines - please split up the PR further as 1000 lines changed should be an absolute maximum. For example you could separate PR the core implementation and testing, and you could further split the core implementation up into separate logical PRs as possible to reduce the amount of code to be reviewed - aim for only a few 100 lines changed at once when the PR is complex. Aside from that I think there are some high level things that should change and a detailed review would be redundant.

Implementing fetch of dual source blending data is good, but I do not want to add any new columns for it. The only thing this change should do is obtain the dual source second factor (when referenced) and it can display that data with the rest of the shader output. The vastly overwhelming number of cases using dual source blending will only use an additional alpha, so this should be added to the shader out as "A2: x.yyy" after the existing R/G/B/A. If someone is actually using a dual source colour blend factor you could add "R2: x.yyy, G2: x.yyy, B2: x.yyy" in the shader output as well.

Since we know that dual source blending is only used with a single render target, it makes most sense to display it directly with the shader output as-if there were 5 components output instead of 4.

I agree that supporting this on GL is not feasible due to the need to patch shaders, but this should be very easily supportable on D3D11 since D3D doesn't use the somewhat obtuse 'index' system of GL/VK, and just sources dual source blending information from the second render target. No shader patching is needed in that case. If you want to have another look at it in that case that would be appreciated, but if you don't want to touch the D3D11 code at all I can do a follow on fix.

Similarly looking briefly at the shader patching for vulkan I think your approach is more complex than is needed - I think the simpler approach would just be to patch any location = 0, index = 1 outputs to location = 1 when fetching shader output values and bind two render targets to obtain both outputs. Having a system of swapping between index=1 and index=0 makes things more complicated than necessary especially with what you've found regarding needing to handle annotations being missing on index=0.

@w-pearson
Copy link
Contributor Author

w-pearson commented Jul 31, 2023

I'm not going to give this a detailed review as the PR itself is too large for the moment - see the contributing guidelines - please split up the PR further as 1000 lines changed should be an absolute maximum. For example you could separate PR the core implementation and testing, and you could further split the core implementation up into separate logical PRs as possible to reduce the amount of code to be reviewed - aim for only a few 100 lines changed at once when the PR is complex.

The commits are generally targeted at one of API (data_types.h), Vulkan implementation, UI, or Vulkan testing (I think all of them are exclusively one of these but I'd need to double-check), Everything depends on the interface, and Vulkan testing depends on the implementation for the tests to pass (but could still compile on its own). So I could create 4 PRs, one for the API on its own, and 3 containing the API commits and then the other commits. Or do you want something else?

but I do not want to add any new columns for it

Do you have an opinion on adding columns in general? In particular, this PR also adds optional columns for the blend source factor and the blend dest factor. Theoretically this could be a separate change (and I have everything related to that in separate commits already, apart from the python test for dual source blending which also checks that info). I could create separate PRs for that feature as well, but then it would be even more PRs with somewhat convoluted interdependencies. I'm a bit worried about making it hard to review because of that (and having to deal with updating several PRs to account for a change to one of them).

The only thing this change should do is obtain the dual source second factor (when referenced) and it can display that data with the rest of the shader output.

The downside of doing this is that you can't see the color itself visually, but on the other hand the main alpha value isn't displayed visually either (except when the alpha channel is the only channel enabled in the texture viewer). So this wouldn't be too different from the status quo.

I think the simpler approach would just be to patch any location = 0, index = 1 outputs to location = 1 when fetching shader output values and bind two render targets to obtain both outputs.

That probably is simpler overall. There is still some awkwardness with location = 0 being implicit, but it wouldn't be too bad.

The mildly annoying thing about this is that the Vulkan spec doesn't actually mandate that there be only one render target when using dual-source blending; instead it defines both maxFragmentOutputAttachments and maxFragmentDualSrcAttachments. In practice I think maxFragmentDualSrcAttachments is always 1, but I'm not 100% sure of this. The existing swap approach works around this by not caring about which output location is being used, and instead swapping everything and letting the existing logic handle choosing one of the render targets (based on what was selected in the UI).

EDIT: the same applies for OpenGL, with GL_MAX_DRAW_BUFFERS and GL_MAX_DUAL_SOURCE_DRAW_BUFFERS, though the wiki notes that GL_MAX_DUAL_SOURCE_DRAW_BUFFERS is practically always 1.

@w-pearson
Copy link
Contributor Author

I have split this PR into 4 separate PRs:

Hopefully these are of a more managable size; if needed I can split them further on a per-commit basis.

I have not yet updated those PRs to account for the feedback provided here, though I did rebase them.

@baldurk
Copy link
Owner

baldurk commented Aug 1, 2023

I have closed those other PRs for now. We should please finish the discussion here before doing anything else, as otherwise that will cause more problems than it will solve. Especially since some of this discussion I expect will cause you to refactor or change a lot of the code you have written so far. There is no rush here and no deadline to meet, we shouldn't leap ahead without making sure that we are on the same page first.

For example, a big reason to keep PRs to a reasonable size is to make sure that it is a reasonable chunk of work to review. Opening 4 simultaneous PRs is not meaningfully different to 1 large PR in terms of that burden, and potentially makes it harder to organise. Another challenge is because review feedback given on earlier PRs could have substantial impact on later PRs if some foundational work changes, meaning lots of churn and nothing can meaningfully be done with those later PRs until the earlier ones are merged.

Instead stick with a single initial logical PR, which you can force push to this one if you like, then once that has been reviewed and merged work on the next. If you have follow up code already ready that's fine, though usually the best option for a large change like this is to either PR or at least get high level feedback as you're developing to minimise wasted work - I suspect a large amount of what you've done here will not go into the final merged PR, and communicating earlier or PR'ing smaller segments of work that would have saved you time going in the wrong direction.

I would recommend that you slow down and take your time with this, that will produce better results for you and certainly it will make it easier for me as the maintainer.


The commits are generally targeted at one of API (data_types.h), Vulkan implementation, UI, or Vulkan testing (I think all of them are exclusively one of these but I'd need to double-check), Everything depends on the interface, and Vulkan testing depends on the implementation for the tests to pass (but could still compile on its own). So I could create 4 PRs, one for the API on its own, and 3 containing the API commits and then the other commits. Or do you want something else?

The division of code is somewhat arbitrary and I don't want to proscribe a super specific method since that can be counter productive. The way I would suggest to think of it is that while code development normally happens with an indirect route and you might go back and make changes after finishing a later part, once you have a logical portion.

An alternative way to look at it - PRs only have to be well formatted and compile (and not break things, though that comes up in review). Crucially there's no requirement for a PR to be a self-contained working piece of code end to end, and indeed with a larger feature that you're breaking up that might be the best way to do it. You could first submit only the shader patching helper function even if nothing calls it. Then the fetching of dual-source blend outputs just writing to dummy local variables. And so forth from there.

Do you have an opinion on adding columns in general? In particular, this PR also adds optional columns for the blend source factor and the blend dest factor. Theoretically this could be a separate change (and I have everything related to that in separate commits already, apart from the python test for dual source blending which also checks that info). I could create separate PRs for that feature as well, but then it would be even more PRs with somewhat convoluted interdependencies. I'm a bit worried about making it hard to review because of that (and having to deal with updating several PRs to account for a change to one of them).

I don't see a value to having these extra columns. The source and destination blend factors are either implicit from the fixed blend setup that is visible in the pipeline state, or they are the tex before/shader output value already displayed. I don't think there is any benefit to re-displaying them in a slightly different form.

The downside of doing this is that you can't see the color itself visually, but on the other hand the main alpha value isn't displayed visually either (except when the alpha channel is the only channel enabled in the texture viewer). So this wouldn't be too different from the status quo.

As you say this is already not always the case, and honestly I would place my bets that there is no currently running code on any GPUs today that actually uses the RGB dual source blend factors for a real purpose. Dual source alpha certainly makes sense, but using RGB is a very narrow niche and I'm not aware of any use cases. Adding significant extra scaffolding for what is possibly a theoretical use case is not justified in this case.

The mildly annoying thing about this is that the Vulkan spec doesn't actually mandate that there be only one render target when using dual-source blending; instead it defines both maxFragmentOutputAttachments and maxFragmentDualSrcAttachments. In practice I think maxFragmentDualSrcAttachments is always 1, but I'm not 100% sure of this. The existing swap approach works around this by not caring about which output location is being used, and instead swapping everything and letting the existing logic handle choosing one of the render targets (based on what was selected in the UI).

EDIT: the same applies for OpenGL, with GL_MAX_DRAW_BUFFERS and GL_MAX_DUAL_SOURCE_DRAW_BUFFERS, though the wiki notes that GL_MAX_DUAL_SOURCE_DRAW_BUFFERS is practically always 1.

Yes this is an unfortunate thing in GL (and Vulkan inherited it), but it can basically be ignored and treated as if it's always 1. The number of devices that support more than one dual-source targets is not significant and the number of applications is again most likely 0. Even if some application did want to support this, I don't think it would be worth supporting in RenderDoc due to how niche it is.

@w-pearson
Copy link
Contributor Author

An alternative way to look at it - PRs only have to be well formatted and compile (and not break things, though that comes up in review). Crucially there's no requirement for a PR to be a self-contained working piece of code end to end, and indeed with a larger feature that you're breaking up that might be the best way to do it. You could first submit only the shader patching helper function even if nothing calls it. Then the fetching of dual-source blend outputs just writing to dummy local variables. And so forth from there.

I will try to keep this in mind in the future. I personally like to try and get something functional that I can play around with, so that I can make sure my assumptions are correct and the design isn't a dead-end. But that approach does also mean that I end up with a lot of code that could be submitted for review all at once, which is a larger burden for you, and it's definitely important for you to be able to review everything.

To get a better idea of what you want: assuming we did still go with additional columns, would 33f4b4b ("Use an enum for pixel history column ids instead of magic numbers") on its own be something you want as a single pull request, or would you prefer that bundled with other commits that actually add more columns? Adding more columns without getting rid of the magic numbers is impractical, but getting rid of the magic numbers without other context also seems wrong to me.

The way I would suggest to think of it is that while code development normally happens with an indirect route and you might go back and make changes after finishing a later part, once you have a logical portion.

I feel like you are missing part of the sentence here. Can you clarify what you meant by this?


I don't see a value to having these extra columns. The source and destination blend factors are either implicit from the fixed blend setup that is visible in the pipeline state, or they are the tex before/shader output value already displayed. I don't think there is any benefit to re-displaying them in a slightly different form.

To clarify: it's not the factors themselves that I'm showing, but instead the source blend factor times the source color and the dest blend factor times the dest color. Unfortunately I don't know of a specific name for these.

The source column shows what the result would be if the destination factor is set to 0 for that primitive only (with all previous primitives still being blended normally), while the destination column shows the source factor set to 0. (The blend op is always set to VK_BLEND_OP_ADD as well, so it is always the case that for the source column, R = Rs0 × Sr and for the dest column, R = Rd × Dr.)

For simple blend configurations where the source and destination factors are both 0/1/alpha, this is still not particularly useful. But if the factors are colors (not necessarily using dual source), then there is a benefit to it: you see the multiplied color, which is somewhat hard to mentally follow. Being able to see both halves of a more complicated blend operation is helpful in my opinion, though it certainly is not something that's needed by default.


As you say this is already not always the case, and honestly I would place my bets that there is no currently running code on any GPUs today that actually uses the RGB dual source blend factors for a real purpose. Dual source alpha certainly makes sense, but using RGB is a very narrow niche and I'm not aware of any use cases. Adding significant extra scaffolding for what is possibly a theoretical use case is not justified in this case.

OpenGL's ARB_blend_func_extended lists two potential uses, both of which seem to be used in the wild:

Example Use Cases

There are several potential uses for this functionality. A first example
is in the implementation of sub-pixel accurate font rendering algorithms.
Given a known layout of pixel elements (red, green and blue components),
coverage may be calculated independently for each element and passed
to the blender in the second source color as a per-channel opacity. To use
this mode, use the following blend functions:

glBlendFunc(GL_SRC1_COLOR, GL_ONE_MINUS_SRC1_COLOR);

As a second example, consider a partially reflective colored glass window.
It will attenuate light passing through it, and reflect some of the light
that strikes it. Using an appropriate combination of functions, this effect
may be simulated in a single pass using only fixed-function blending
hardware. In this case, the following blend functions may be used:

glBlendFunc(GL_SRC_ALPHA, GL_SRC1_COLOR);

The sub-pixel font rendering case is used by web browsers; Firefox (WebRender) uses it. There's a long document discussion how they do text rendering, with a brief mention of dual source blending. They did implement that and it is used in some cases, but I haven't tried to trace down exactly when it's enabled (I suspect they have some checks to make sure the monitor has an expected subpixel configuration and such). I also was unable to capture it in RenderDoc (they have instructions but I couldn't get it to work) so I'm not 100% sure it's used in practice. This is also a case where, although there are 3 values + alpha for the second output, the values are not really meaningful as a color, so visually the color column wouldn't help (but blend source/dest could still be useful).

The colored glass window case does seem to exist, too. I found a paper (PDF) discussing transparency in Cyberpunk 2077 (released 2020). It includes this paragraph:

Glass Blending. Unlike smoke and other participating media, glass
tints the objects behind it, in addition to "additive blending effects".
Therefore, traditional alpha blending does not work: we would
need to sacrifice either one or the other. To address that, we utilized
dual-source blending. This allowed us (with some limitations) to
have the best of both worlds with little to no overhead.

Unfortunately, it seems it was for a talk at a 2021 SIGGRAPH session which was not recorded. So there isn't any further context beyond that. But I assume this is a case where SRC1_COLOR is used, and more importantly represents a color in the game world, so a column for it would be informative (and blend source/dest would also be informative, as multiplying colors in your head isn't easy).

The Witness (released 2016) also has a section with tinted glass that attenuates the colors behind it (the bunker/greenhouse), but it doesn't seem like they ever discussed how that was implemented, and it might just use SRC_COLOR instead since I'm not sure if they actually have an additive blending component to it. I haven't captured either of these games, since they aren't mine, of course.

And of course there's whatever's happening with emulation where the emulated hardware has some other blending feature/quirk that dual source blending is used to implement/approximate. Dolphin only uses alpha, but other emulators might need to do other things (though I haven't looked for evidence of this).

@baldurk
Copy link
Owner

baldurk commented Aug 2, 2023

I will try to keep this in mind in the future. I personally like to try and get something functional that I can play around with, so that I can make sure my assumptions are correct and the design isn't a dead-end. But that approach does also mean that I end up with a lot of code that could be submitted for review all at once, which is a larger burden for you, and it's definitely important for you to be able to review everything.

If you want to be able to prove out some code end-to-end then that's totally fine but you then bear the risk of having to throw away more code if something you did at the start was in an undesired direction. Generally I don't recommend getting too far ahead of what you've landed - you can take advantage of being able to land supporting/interim code upstream to keep your feature branch from getting too large.

You can also always ask about an intended design - e.g. I could have let you know early not to add any new columns, not to worry about handling multiple output attachments, and not to do anything special with the colour1 parameter (only treat these as 'extra alpha channels' on the shader out effectively).

To get a better idea of what you want: assuming we did still go with additional columns, would 33f4b4b ("Use an enum for pixel history column ids instead of magic numbers") on its own be something you want as a single pull request, or would you prefer that bundled with other commits that actually add more columns? Adding more columns without getting rid of the magic numbers is impractical, but getting rid of the magic numbers without other context also seems wrong to me.

That would be fine to include with other commits, especially the other UI changes as they are logically bundled together and any review feedback will likely affect them as a whole. You could probably land that on its own if you wanted, a refactor to improve readability is fine if you explain the intention (I might reject it only if you didn't explain why you wanted to change it). The limit of 1000 lines is pretty arbitrary and more a rule of thumb than anything else. 1000 lines could be very easy to review or could be incredibly dense, so I set the guide conservatively so that at least there was a specific guide and it wasn't just up to my whims.

I feel like you are missing part of the sentence here. Can you clarify what you meant by this?

Looks like this is a thought I didn't finish while drafting the response. I think what I was meaning was that code development doesn't always get things right first time, and if a feature is large enough the first PR you land doesn't have to be 'final code'. Any PR will be reviewed on its own merits and any issues immediately apparent will get resolved that way. If you're developing a feature and it lands over 3 or a dozen or a hundred PRs it isn't a problem if one PR changes some code that seemed reasonable in an earlier PR. That's just how code works and past a certain scale (1000 lines is the one I picked out of a hat) you shouldn't try to perfect everything before reviewing and landing it.

As soon as you have some kind of logical chunk of code completed you can PR that - even if you haven't written the rest of the code, and it doesn't do anything on its own or is even referenced. Until quite recently there was stub D3D8/D3D9 code which was committed years ago on its own and kept around. I appreciate with a change like this it's close to small enough for one PR where it might be a harder call to make, but you would have to go to extremes before I'd start rejecting a feature like this for being split into too many PRs if you explained what the end goal was, so I'd say err on the side of looking for smaller chunks if you're starting to reach that 1000 diff litmus test.

@w-pearson
Copy link
Contributor Author

OK. For now, I think I'll open a new PR containing only b57b488. That one is probably a good starting point as it's a stand-alone fix for an issue there's already a TODO comment for. (It does become a easier to trigger when PerFragmentInfo increases in size, though.) I also already have a demo+automated test for that in c115cb6, though the automated test also checks blendSrc/blendDst currently (which would not be hard to remove/re-add later if needed). Do you want that as part of the same PR, or submitted later?

@baldurk
Copy link
Owner

baldurk commented Aug 5, 2023

That sounds good to me, that commit is a useful bugfix regardless so it works in isolation - looking at it, I think that explains and fixes a bug @Zorro666 noticed a few days ago while testing 1.28 before release.

Whether to include the test is up to you - I never require tests for bugfixes but it's certainly welcome. As you say it would need to be modified a bit to work with the current API surface and is a bit much for just that fix, but makes sense if it's going to be more useful once other things land. An alternative would be to add a new 100s-of-overlapping-tris draw to VK_Pixel_History that fails the depth test so no other checks need to be changed, but I'm fine if you include the test as-is with minimal modifications to work.

@baldurk
Copy link
Owner

baldurk commented Sep 29, 2023

I'm going to close this PR now as I think you are still planning to split it out into separate PRs as you have been doing and having a stale PR open is unnecessary.

@baldurk baldurk closed this Sep 29, 2023
@w-pearson
Copy link
Contributor Author

OK, I'll open additional split out PRs in the future once I have them ready. (I haven't taken the time to work on this in a while though, beyond locally resolving merge conflicts.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pixel history should show both shader outputs when dual-source blending is in use
2 participants