Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layers: Allow using uncached buffer for Debug Printf #6128

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dorian-apanel-intel
Copy link

Adds option to force using AMD_DEVICE_COHERENT_MEMORY for debug printf buffer to print messages even if VK_ERROR_DEVICE_LOST is encountered.

Forcing extension and device feature if not enabled by application.

Added workaround for atomic operations in
uncached memory being in cache anyway.

Added workaround to failing MapMemory after DEVICE_LOST (occurs on AMD): When using uncached buffer, do not unmap buffer until messages are analyzed.

This PR attempts to implement #6101 Tested on AMD and Intel (extension not public yet) using this modified vkcube: dorian-apanel-intel/Vulkan-Tools@e022693

@dorian-apanel-intel dorian-apanel-intel requested a review from a team as a code owner July 12, 2023 12:43
@CLAassistant
Copy link

CLAassistant commented Jul 12, 2023

CLA assistant check
All committers have signed the CLA.

@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@dorian-apanel-intel dorian-apanel-intel changed the title layers: gpu-av Debug Printf using uncached buffer layers: Allow using uncached buffer for Debug Printf Jul 12, 2023
@dorian-apanel-intel dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 749483d to af6ca10 Compare July 12, 2023 12:55
@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@dorian-apanel-intel dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from af6ca10 to d148df2 Compare July 12, 2023 12:57
@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

1 similar comment
@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@dorian-apanel-intel dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 2a5f0a9 to ada331e Compare July 12, 2023 13:17
@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

layers/gpu_validation/debug_printf.cpp Outdated Show resolved Hide resolved
layers/gpu_validation/debug_printf.cpp Outdated Show resolved Hide resolved
@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 5177.

Copy link
Contributor

@TonyBarbour TonyBarbour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested? Is there a test that can go with this?

Copy link
Contributor

@juan-lunarg juan-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any testing being added. Is the plan to add them later?

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12676 running.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12676 passed.

@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@dorian-apanel-intel
Copy link
Author

@juan-lunarg @Tony-LunarG

I don't see any testing being added. Is the plan to add them later?
Has this been tested? Is there a test that can go with this?

I've only tested it manually so far.
Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 5894.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12692 running.

@TonyBarbour
Copy link
Contributor

@juan-lunarg @Tony-LunarG

I don't see any testing being added. Is the plan to add them later?
Has this been tested? Is there a test that can go with this?

I've only tested it manually so far. Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

I think so, as long as the test gets a DEVICE_LOST and then successfully parses a debug printf

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12692 passed.

@juan-lunarg
Copy link
Contributor

Would it be enough to copy one test from tests\negative\debug_printf.cpp and modify it to use uncached buffer and hang?

This sounds like it would cause problems for Jenkins CI (Although if we switched to ctest maybe not... since the test would be in it's own process).

Because currently CI invokes the test executable and tries running all tests.

@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

Adds option to force using AMD_DEVICE_COHERENT_MEMORY
for debug printf buffer to print messages
even if VK_ERROR_DEVICE_LOST is encountered.

* Option added to layer json to be visible in vkconfig,
* Forcing extension and device feature if not enabled by application,
* Added workaround for atomic operations in uncached memory
being in cache anyway,
* Added workaround to failing MapMemory after
DEVICE_LOST (occurs on AMD): When using uncached buffer,
do not unmap buffer until messages are analyzed.
@dorian-apanel-intel dorian-apanel-intel force-pushed the dapanel/uncached_debug_printf_buffer branch from 99bb797 to 090d262 Compare July 17, 2023 12:22
@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@dorian-apanel-intel
Copy link
Author

@Tony-LunarG @juan-lunarg @jeremyg-lunarg

  • I've added unit test, had to pass a value using push constants, because hardcoded infinite loop was not hanging on AMD.
  • I modified layer_options.cpp file to move option from test to layer. It is not perfect, but after looking at layer_configuration.md does not describe programmatic configuration of layers #5895, I think it will need rewrite soon anyway.
  • After updating AMD drivers, Atomics WA is no longer needed there, but leaving it for Intel for now.
  • NVidia does not support required extension yet.
  • Force push was required due to printf test file rename.

I plan to add also unit test for Page Fault, but preferably in another pull request.

@ci-tester-lunarg
Copy link
Collaborator

Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 7794.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12728 running.

1 similar comment
@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12728 running.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12728 failed.

@juan-lunarg
Copy link
Contributor

Galaxy S22 testing failed

[ RUN      ] NegativeDebugPrintf.UncachedBuffer
jni/../../tests/framework/error_monitor.cpp:200: Failure
Failed
Validation Error: [ UNASSIGNED-DEBUG-PRINTF ] Object 0: handle = 0x6c1d064220, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x92394c89 | Setup Error. Detail: (Unable to allocate device memory.  Device could become unstable.)
jni/../../tests/framework/error_monitor.cpp:241: Failure
Failed
Did not receive expected error 'Here are three float values 1.000000, 3.141500, 0.000000'
[  FAILED  ] NegativeDebugPrintf.UncachedBuffer (5836 ms)

Copy link
Contributor

@juan-lunarg juan-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the test 👍🏾

I'll defer to @Tony-LunarG and @jeremyg-lunarg for the actual review.

@TonyBarbour
Copy link
Contributor

Galaxy S22 testing failed

[ RUN      ] NegativeDebugPrintf.UncachedBuffer
jni/../../tests/framework/error_monitor.cpp:200: Failure
Failed
Validation Error: [ UNASSIGNED-DEBUG-PRINTF ] Object 0: handle = 0x6c1d064220, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x92394c89 | Setup Error. Detail: (Unable to allocate device memory.  Device could become unstable.)
jni/../../tests/framework/error_monitor.cpp:241: Failure
Failed
Did not receive expected error 'Here are three float values 1.000000, 3.141500, 0.000000'
[  FAILED  ] NegativeDebugPrintf.UncachedBuffer (5836 ms)

I don't think there's much the test can do about that. I'll just add the new test to the S22 blacklist.

Copy link
Contributor

@TonyBarbour TonyBarbour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message processing loop won't exit correctly


uint32_t index = spvtools::kDebugOutputDataOffset;
while (debug_output_buffer[index]) {
while ((index < output_buffer_size) && debug_output_buffer[index]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this loop will work if the output buffer has one message but isn't full. To see this, use uncached buffer memory in the test NegativeDebugPrintf.BasicUsage and run it. This loop doesn't stop when it's supposed to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check in which vkQueueSubmit/AnalyzeAndGenerateMessages you observe it? BasicUsage does it 15 times.

Here is how I understand it:

In my first one, debug_output_buffer == 0x0000028b8e520000
Memory of that buffer is:

0x0000028B8E520000  00 00 00 00  MBZ
0x0000028B8E520004  0a 00 00 00  expect (size of all messages, written by atomic in shader)
0x0000028B8E520008  0a 00 00 00  DPFOutputRecord.size (size in dwords of this message)
0x0000028B8E52000C  00 00 00 00  DPFOutputRecord.shader_id
0x0000028B8E520010  70 00 00 00  DPFOutputRecord.instruction_position
0x0000028B8E520014  00 00 00 00  DPFOutputRecord.stage
0x0000028B8E520018  00 00 00 00  DPFOutputRecord.stage_word_1
0x0000028B8E52001C  00 00 00 00  DPFOutputRecord.stage_word_2
0x0000028B8E520020  00 00 00 00  DPFOutputRecord.stage_word_3
0x0000028B8E520024  29 00 00 00  DPFOutputRecord.format_string_id ("Here are two float values %f, %f")
0x0000028B8E520028  00 00 80 3f  DPFOutputRecord.values (float 1.0)
0x0000028B8E52002C  56 0e 49 40  DPFOutputRecord.values (float 3.14150000)
0x0000028B8E520030  00 00 00 00  DPFOutputRecord.size  (no more messages)
0x0000028B8E520034  00 00 00 00  DPFOutputRecord.
0x0000028B8E520038  00 00 00 00  DPFOutputRecord.

Index starts at 0x2. At the end of first loop iteration:
index += debug_record->size; // 0x2 + 0xa = 0xc
At second loop beginning
debug_output_buffer[index] == 0x0000028b8e520030 {0x0}
So the loop ends.

(index < output_buffer_size) means "while we are still in the buffer"
&& debug_output_buffer[index]) means "while next message is not empty"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have access to the system I was using to test this today - I'll get back to it tomorrow. The system I do have is getting VK_TIMEOUT instead of VK_ERROR_DEVICE_LOST or VK_SUCCESS at the vkQueueWaitIdle, and I had to add that to the test's expected results. You should add that to your test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added VK_TIMEOUT to asserts in test.
I assume you used AMD for that test. it happened to me once too.
VK_TIMEOUT should be not returned from vkQueueSubmit, vkDeviceWaitIdle or vkQueueWaitIdle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem. Because debug_output_buffer[1] comes back as 0, the memset at the end of message processing doesn't clear the buffer:

memset(debug_output_buffer, 0, 4 * (debug_output_buffer[spvtools::kDebugOutputSizeOffset] + spvtools::kDebugOutputDataOffset));

And the memset clears 4 * (0 + 2) bytes which doesn't cover all of the records. Then on the next printf, the loop depends on debug_output_buffer[index] being zero when the loop is supposed to stop, but previous printfs may have left values that didn't get memset to 0. I suppose one answer would be to memset the whole buffer.

But I'm seeing another problem. If I modify your test to do two debug printfs:

          if (gl_VertexIndex == 0) {
                debugPrintfEXT("Here are three float values %f, %f, %f", 1.0, myfloat, gl_Position.x);
                debugPrintfEXT("Here's another debug printf");
                float x = constants.x[0];
                while(x > -1.f) { // infinite loop
                    x += 0.000001f;
                }
                debugPrintfEXT("Here is a value that should not be printed %f", x);
            }

I only get the second printf back, as if the first one was overwritten in the shader. Could that be a result of the atomic operation problem and debug_output_buffer[1] not getting written?
(This is all on an AMD RX6600 with driver 23.7.1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomic add is performed by the instrumentation added to the shader via spirv-tools code, which I'm not very familiar with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so after a bit of digging, this extension was enabled before we had the MALL cache in hardware, and nobody ever went back and changed the interface to bypass MALL, so there IS caching enabled on this memory type for newer hardware. Whoops. I've filed an internal bug to fix this and to add a more comprehensive test to make sure this actually works on the off chance we make another big change to the caching hierarchy...

You can go look at the public driver source if for some reason you want to see where this is happening 🙃

I'll let you know when it's fixed and subsequently released.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tobski I assume the fix is released now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fix has been implemented but I'm not sure on release status, I'll take a look.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 9060.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12764 running.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12764 passed.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 10076.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12781 running.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12781 failed.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 10420.

@lunarpapillo
Copy link
Contributor

Last LunarG CI failure was due to a system issue. I've restarted.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12787 running.

@ci-tester-lunarg
Copy link
Collaborator

CI Vulkan-ValidationLayers build # 12787 passed.

@@ -672,6 +821,9 @@ void DebugPrintf::AllocateDebugPrintfResources(const VkCommandBuffer cmd_buffer,
buffer_info.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
VmaAllocationCreateInfo alloc_info = {};
alloc_info.requiredFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;
if (use_uncached_buffer) {
alloc_info.requiredFlags |= VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want to require UNCACHED here, since theoretically in future you might get COHERENT and CACHED memory. Uncached is meant to be informative, not something you generally rely upon unless you're doing some very specific performance tuning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand. Here I want to get device uncached to read it after device is lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants