Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan] Renderdoc layer changing image memory requirements affects programs relying on external memory #3290

Closed
Mon-Ouie opened this issue Apr 2, 2024 · 8 comments
Labels
Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue

Comments

@Mon-Ouie
Copy link

Mon-Ouie commented Apr 2, 2024

Description

Renderdoc can in some cases change the image memory requirements from what the driver reports. If an application relies on external memory to share data between processes (e.g. using VK_KHR_external_memory_fd), the mismatch can cause validation errors or crashes in the application running Renderdoc and getting incorrect requirements, e.g.:

VUID-VkMemoryDedicatedAllocateInfo-image-02964(ERROR / SPEC): msgNum: 1700101245 - Validation Error: [ VUID-VkMemoryDedicatedAllocateInfo-image-02964 ] Object 0: handle = 0xb6bee80000000073, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x65557c7d | vkAllocateMemory(): pAllocateInfo->allocationSize (37945344) needs to be equal to pAllocateInfo->pNext<VkMemoryDedicatedAllocateInfo>.image (VkImage 0xb6bee80000000073[]) VkMemoryRequirements::size (37748736). The Vulkan spec states: If image is not VK_NULL_HANDLE and the memory is not an imported Android Hardware Buffer or an imported QNX Screen buffer , VkMemoryAllocateInfo::allocationSize must equal the VkMemoryRequirements::size of the image (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkMemoryDedicatedAllocateInfo-image-02964)

I'm not sure if there is a way to disable the workarounds that change image size requirements, or if they're even supposed to be run on my system. I found this check in Renderdoc, but it seems this should only affect AMD's own drivers, not Mesa? I'm not sure if there are other places in the code that alter memory image requirements, or if the driver is being misidentified. I did not previously encounter this behavior when running the same code on an NVIDIA GPU using NVIDIA's proprietary drivers.

Steps to reproduce

I encounter this issue on an AMD GPU using the Mesa RADV GPU (I checked that vulkaninfoindeed reports driverID = DRIVER_ID_MESA_RADV) when using openxr through monado:

  1. Start the monado service (I believe this would require some kind of hardware supported by Monado for the next steps to succeed).
  2. Run a program that tries to create an OpenXR swapchain with ENABLE_VULKAN_RENDERDOC_CAPTURE=1
  3. Program will fail with an error message such as ERROR [vk_alloc_and_bind_image_memory] (vk_create_image_from_native) vkGetImageMemoryRequirements: Requested more memory (37945344) then given (37748736).

Sample OpenXR code that I used to reproduce this:

#include <vulkan/vulkan.h>

#define XR_USE_PLATFORM_XLIB
#define XR_USE_GRAPHICS_API_VULKAN
#include <openxr/openxr.h>
#include <openxr/openxr_platform.h>

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>

int xr_check(XrInstance instance, XrResult result, const char *prefix) {
  if (XR_SUCCEEDED(result)) return 1;

  char message[XR_MAX_RESULT_STRING_SIZE];
  xrResultToString(instance, result, message);

  fprintf(stderr, "%s %s\n", prefix, message);
  return 0;
}

#define XSTRINGIFY(X) #X
#define STRINGIFY(X) XSTRINGIFY(X)

#define XR_CHECK(instance, X) xr_check((instance), (X), "[" __FILE__ ":" STRINGIFY(__LINE__) "] " #X ": ")

#define countof(array) (sizeof(array)/sizeof(*(array)))

int main(int argc, char *argv[]) {
  const char* enabled_exts[] = {
    XR_KHR_VULKAN_ENABLE2_EXTENSION_NAME
  };

  const char *enabled_layers[] = {
    "XR_APILAYER_LUNARG_core_validation"
  };

  XrInstance instance;
  XrInstanceCreateInfo info = {};

  info.type = XR_TYPE_INSTANCE_CREATE_INFO;
  info.next = NULL;

  strlcpy(info.applicationInfo.applicationName, "xr-test",
          sizeof(info.applicationInfo.applicationName));
  info.applicationInfo.applicationVersion = 1;

  strlcpy(info.applicationInfo.engineName, "xr-test",
          sizeof(info.applicationInfo.engineName));
  info.applicationInfo.engineVersion = 1;

  info.applicationInfo.apiVersion = XR_CURRENT_API_VERSION;

  info.enabledExtensionCount = countof(enabled_exts);
  info.enabledExtensionNames = enabled_exts;
  info.enabledApiLayerCount = countof(enabled_layers);
  info.enabledApiLayerNames = enabled_layers;

  info.createFlags = 0;

  if (!XR_CHECK(NULL, xrCreateInstance(&info, &instance))) return 1;

  PFN_xrCreateVulkanInstanceKHR xrCreateVulkanInstanceKHR;
  XR_CHECK(
    instance,
    xrGetInstanceProcAddr(
      instance, "xrCreateVulkanInstanceKHR",
      (PFN_xrVoidFunction*)&xrCreateVulkanInstanceKHR));

  PFN_xrCreateVulkanDeviceKHR xrCreateVulkanDeviceKHR;
  XR_CHECK(
    instance,
    xrGetInstanceProcAddr(
      instance, "xrCreateVulkanDeviceKHR",
      (PFN_xrVoidFunction*)&xrCreateVulkanDeviceKHR));

  PFN_xrGetVulkanGraphicsRequirements2KHR xrGetVulkanGraphicsRequirements2KHR;
  XR_CHECK(
    instance,
    xrGetInstanceProcAddr(
      instance, "xrGetVulkanGraphicsRequirements2KHR",
      (PFN_xrVoidFunction*)&xrGetVulkanGraphicsRequirements2KHR));

  PFN_xrGetVulkanGraphicsDevice2KHR xrGetVulkanGraphicsDevice2KHR;
  XR_CHECK(
    instance,
    xrGetInstanceProcAddr(
      instance, "xrGetVulkanGraphicsDevice2KHR",
      (PFN_xrVoidFunction*)&xrGetVulkanGraphicsDevice2KHR));

  XrSystemId system_id;
  XrSystemGetInfo system_get_info = {0};
  system_get_info.type = XR_TYPE_SYSTEM_GET_INFO;
  system_get_info.formFactor = XR_FORM_FACTOR_HEAD_MOUNTED_DISPLAY;
  if (!XR_CHECK(instance, xrGetSystem(instance, &system_get_info, &system_id)))
    return 1;

  const char *layers[] = {"VK_LAYER_KHRONOS_validation"};


  VkApplicationInfo app_info = {};
  app_info.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
  app_info.applicationVersion = VK_MAKE_VERSION(1, 0, 0);
  app_info.engineVersion = VK_MAKE_VERSION(1, 0, 0);
  app_info.pApplicationName = "pipeline-test";
  app_info.pEngineName = "pipeline-test";
  app_info.apiVersion = VK_MAKE_API_VERSION(0, 1, 3, 0);

  VkInstanceCreateInfo vk_instance_info = {};
  vk_instance_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
  vk_instance_info.pApplicationInfo = &app_info;
  vk_instance_info.enabledLayerCount = countof(layers);
  vk_instance_info.ppEnabledLayerNames = layers;

  XrVulkanInstanceCreateInfoKHR xr_instance_info = {};
  xr_instance_info.type = XR_TYPE_VULKAN_INSTANCE_CREATE_INFO_KHR;
  xr_instance_info.vulkanCreateInfo = &vk_instance_info;
  xr_instance_info.systemId = system_id;
  xr_instance_info.pfnGetInstanceProcAddr = vkGetInstanceProcAddr;

  VkInstance vk_instance;
  VkResult vk_result;
  XR_CHECK(instance, xrCreateVulkanInstanceKHR(
             instance, &xr_instance_info, &vk_instance, &vk_result));

  XrVulkanGraphicsDeviceGetInfoKHR physical_device_info = {};
  physical_device_info.type = XR_TYPE_VULKAN_GRAPHICS_DEVICE_GET_INFO_KHR;
  physical_device_info.vulkanInstance = vk_instance;
  physical_device_info.systemId = system_id;

  VkPhysicalDevice physical_device;
  XR_CHECK(instance, xrGetVulkanGraphicsDevice2KHR(instance, &physical_device_info, &physical_device));

  VkPhysicalDeviceFeatures2 features = {VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2};

  VkDeviceQueueCreateInfo queue_info = {VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO};
  float priorities = 1.0;
  queue_info.queueCount = 1;
  queue_info.queueFamilyIndex = 0;
  queue_info.pQueuePriorities = &priorities;

  VkDeviceCreateInfo vk_device_info = {};
  vk_device_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
  vk_device_info.pNext = (void*)&features;
  vk_device_info.queueCreateInfoCount = 1;
  vk_device_info.pQueueCreateInfos = &queue_info;

  XrVulkanDeviceCreateInfoKHR xr_device_info = {};
  xr_device_info.type = XR_TYPE_VULKAN_DEVICE_CREATE_INFO_KHR;
  xr_device_info.vulkanCreateInfo = &vk_device_info;
  xr_device_info.vulkanPhysicalDevice = physical_device;
  xr_device_info.pfnGetInstanceProcAddr = vkGetInstanceProcAddr;
  xr_device_info.systemId = system_id;

  VkDevice vk_device;
  XR_CHECK(instance, xrCreateVulkanDeviceKHR(instance, &xr_device_info, &vk_device, &vk_result));

  XrGraphicsBindingVulkan2KHR graphics_binding = {};
  graphics_binding.type = XR_TYPE_GRAPHICS_BINDING_VULKAN2_KHR;
  graphics_binding.queueIndex = 0;
  graphics_binding.queueFamilyIndex = 0;
  graphics_binding.instance = vk_instance;
  graphics_binding.physicalDevice = physical_device;
  graphics_binding.device = vk_device;

  XrSessionCreateInfo session_info = {};
  session_info.type = XR_TYPE_SESSION_CREATE_INFO;
  session_info.next = &graphics_binding;
  session_info.systemId = system_id;

  XrGraphicsRequirementsVulkan2KHR requirements = {};
  requirements.type = XR_TYPE_GRAPHICS_REQUIREMENTS_VULKAN2_KHR;
  XR_CHECK(instance, xrGetVulkanGraphicsRequirements2KHR(instance, system_id, &requirements));

  XrSession session;
  XR_CHECK(instance, xrCreateSession(instance, &session_info, &session));

  XrViewConfigurationView views[2] = {};
  views[0].type = XR_TYPE_VIEW_CONFIGURATION_VIEW;
  views[1].type = XR_TYPE_VIEW_CONFIGURATION_VIEW;

  uint32_t view_count;
  XR_CHECK(instance,
           xrEnumerateViewConfigurationViews(
               instance, system_id, XR_VIEW_CONFIGURATION_TYPE_PRIMARY_STEREO,
               2, &view_count, views));

  uint32_t width  = views[0].recommendedImageRectWidth;
  uint32_t height = views[0].recommendedImageRectHeight;

  XrSwapchainCreateInfo swapchain_info = {};
  swapchain_info.type = XR_TYPE_SWAPCHAIN_CREATE_INFO;
  swapchain_info.arraySize = 2;
  swapchain_info.faceCount = 1;
  swapchain_info.mipCount = 1;
  swapchain_info.sampleCount = 1;
  swapchain_info.width = width;
  swapchain_info.height = height;
  swapchain_info.usageFlags = XR_SWAPCHAIN_USAGE_COLOR_ATTACHMENT_BIT |
    XR_SWAPCHAIN_USAGE_TRANSFER_SRC_BIT;
  swapchain_info.format = VK_FORMAT_R8G8B8A8_SRGB;

  XrSwapchain swapchain;
  XR_CHECK(instance, xrCreateSwapchain(session, &swapchain_info, &swapchain));

  return 0;
}

compiled and executed with gcc xr-swapchain-issue.c -lopenxr_loader -lvulkan -o xr-swapchain-issue && ENABLE_VULKAN_RENDERDOC_CAPTURE=1 ./xr-swapchain-issue.

(The above validation error happens if you manually bypass the test for the image size in monado's code, forcing it to create an image with a size that mismatches what it expects)

Environment

  • RenderDoc version: 1.31
  • Operating System: Arch Linux, Kernel 6.8.2-arch2-1
  • Graphics API: Vulkan (client using API version 1.3)
  • GPU: AMD RX 7900 GRE
  • Driver info: Mesa 24.0.3-arch1.2
  • Monado version used to reproduce the bug: v21.0.0-4378-gb1118a62 (commit hash: b1118a62ff1f1259f294223b697ca59483e0ec5f)

For reference, discussion on Monado's bug tracker.

@baldurk
Copy link
Owner

baldurk commented Apr 2, 2024

You should never enable ENABLE_VULKAN_RENDERDOC_CAPTURE manually. This is an internal implementation detail that should not be set by users - it is not documented anywhere and is not a supported way of using RenderDoc.

Does this problem reproduce if you launch your program from the RenderDoc UI as intended?

Also RenderDoc does not have any support for openxr so if this program or library requires extra API support for openxr that may be the cause of the problem. I'm not familiar with openxr at all to be able to say whether or not it requires anything specific.

With that said I'm not really clear on exactly what is going wrong. You listed a validation error, but it seems to be related to not allocating the right amount of memory for a dedicated allocation. Without context I'm not clear - is your application providing the wrong size? Or is that an allocation coming from inside RenderDoc? You also mention an error message that comes from your library monado but I'm not sure how to interpret the message as there is only one size returned from vkGetImageMemoryRequirements.

If your application is not providing the right size of allocation then you will need to fix that - you should make sure you query for the image memory requirements and allocate the appropriate memory for it.

I don't believe I can follow your steps to reproduce if it requires openxr hardware, though I am not familiar at all with openxr or this monado library so if it is usable without any special hardware I can try to reproduce it.

@baldurk baldurk added Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue labels Apr 2, 2024
@ChristophHaag
Copy link

ChristophHaag commented Apr 2, 2024

Monado doesn't require OpenXR hardware, it defaults to a simple simulator with no hardware.

To go to a higher abstraction level, when an OpenXR application connects to monado-service (different processes), monado-service allocates VkImages, exports them with VK_KHR_External_memory_fd sends the fd over to the OpenXR application process where the fd is imported, and the OpenXR application renders to it. Every multi process application that exports and imports with VK_KHR_External_memory_fd is probably affected.

@Mon-Ouie
Copy link
Author

Mon-Ouie commented Apr 2, 2024

Does this problem reproduce if you launch your program from the RenderDoc UI as intended?

Same output messages in the terminal where qrenderdoc was executed when running the program through there.

Also RenderDoc does not have any support for openxr so if this program or library requires extra API support for openxr that may be the cause of the problem.

I don't think there's anything special about OpenXR as far as the Vulkan usage is concerned, it's just a regular application using IPC mechanisms from published Vulkan extensions.

You also mention an error message that comes from your library monado but I'm not sure how to interpret the message as there is only one size returned from vkGetImageMemoryRequirements

There are two programs running Vulkan and sharing images using a file descriptor. The allocation occurs when one application tries to import an image created by the first. The first size is the size returned by vkGetImageMemoryRequirements in the application importing the image, the second is the size from when the image was created on the server-side (also calling vkGetImageMemoryRequirements, but on the server). In this case I'm only capturing the client process, hence the mismatch.

As far as I can tell, the error in the validation for dedicated allocation happens because Renderdoc is modifying the value returned by vkGetImageMemoryRequirements, so even though the allocation size is the one returned by vkGetImageMemoryRequirements, as seen by the application (see the code that allocates the image), it is not the size that the validation layer expected.

I don't believe I can follow your steps to reproduce if it requires openxr hardware, though I am not familiar at all with openxr or this monado library so if it is usable without any special hardware I can try to reproduce it.

Sorry about that, I would have shared a pure Vulkan program using those features if I had one at hand 😅

Is it possible to easily check the VkDriverInfo that Renderdoc collects? Skimming through the code, it doesn't look like Renderdoc is intended to change the return value of vkGetImageMemoryRequirements except on "old" official AMD drivers.

@baldurk
Copy link
Owner

baldurk commented Apr 2, 2024

OK I can try to reproduce sometime soon then. Without any other knowledge I was going by what the original reporter said that I would need some hardware that monado supports.

I understand now what you're doing at a high level but I'm still not exactly clear what the problem is. Is the only issue that the memory requirements being reported are different when running under RenderDoc than not? That is not unusual but it should not cause any problems for a valid application - you can't expect a specific size from the vulkan API. There are guarantees about alignment etc as well as invariance, so you could require a minimum but not an exact value or a maximum, whether created normally or imported from one of the external memory extensions.

If your program is coded around expecting a particular memory size then that sounds like a problem. The vulkan spec does not guarantee that and applications are expected to query the driver for how much memory is required before binding it.

@ChristophHaag
Copy link

It's entirely possible that monado's checks are stricter than necessary.

Tracing the code a bit there is one code path where this happens and that is when importing an image that has been elsewhere created (i.e. in monado-service). https://gitlab.freedesktop.org/monado/monado/-/blob/8a4963f719ad7f1c8622e9361df52da483c68317/src/xrt/auxiliary/vk/vk_helpers.c#L818-822

Presumably max_size is the size of the originally allocated memory (in monado-service, not running under renderdoc), and the memory requirements that are checked here are queried in the openxr client side running under renderdoc. Renderdoc pads the memory requirements and the check fails.

I'm not sure what the check was originally intended to do but I don't see max_size actually used, so is this a pointless check we should remove?

@baldurk
Copy link
Owner

baldurk commented Apr 14, 2024

RenderDoc doesn't pad memory requirements, but yes an application can't make assumptions about the size required for any resource beyond the limited guarantees the spec provides. That's why I was asking above about what exactly the problem is since it's not clearly explained what (if anything) is going wrong in terms of legal vulkan API behaviour vs. internal error messages.

@baldurk
Copy link
Owner

baldurk commented Apr 15, 2024

I was able to set up an environment and build the dependencies to reproduce this. I ran into that internal check mentioned above, which looks to me to be invalid and I removed that.

After that I didn't encounter any problems or crashes. I did see the validation error and found the cause of that, though the application should not be able to see that error so it shouldn't cause any problems either. I've put in a workaround to prevent it from firing but generally speaking I don't recommend using validation with RenderDoc as it's subject to potential false positives and false negatives. You should run validation with your application on its own separately from RenderDoc to be sure you get accurate results.

So I've still not found any actual bug short of the validation error which as above should not be application-visible to cause any more problems than a false positive if you capture API validation messages. The validation error is technically correct but was causing no practical harm I believe, I've opened an issue within khronos to see if the requirement can be loosened and then the workaround can be removed.

I'll leave this open for now but please clearly explain what the actual bug is (if there is any bug aside from the extra validation error) and how to reproduce it.

@baldurk
Copy link
Owner

baldurk commented May 15, 2024

Closing this due to lack of activity from the reporter and no further information to investigate, with no evidence of a bug at all.

If you are the reporter and this bug is still a problem for you, or you are someone finding this issue and you believe this bug is still a problem and you have more information to share, please do not comment here and instead please open a new issue. You can reference this issue if you wish, but opening a new issue prevents confusion of accidentally linking two unrelated bugs and means that each issue can be handled in a clean process.

@baldurk baldurk closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue
Projects
None yet
Development

No branches or pull requests

3 participants