Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression found by drm: remove VA_DRM_IsRenderNodeFd() helper #692

Open
XinfengZhang opened this issue Mar 1, 2023 · 9 comments
Open

Comments

@XinfengZhang
Copy link
Contributor

XinfengZhang commented Mar 1, 2023

the case is chrome out of process decode case,
after this patch 4d4c5f06519fe4c40d6fe1b2ccff812ff3b3b402
it will break the initialization process, suppose it is because the check in drmGetNodeTypeFromFd is strict than
if (fstat(fd, &st) == 0)
return S_ISCHR(st.st_mode) && (st.st_rdev & 0x80);

ps:
int drmGetNodeTypeFromFd(int fd)
{
struct stat sbuf;
int maj, min, type;

if (fstat(fd, &sbuf))
    return -1;

maj = major(sbuf.st_rdev);
min = minor(sbuf.st_rdev);

if (maj != DRM_MAJOR || !S_ISCHR(sbuf.st_mode)) {
    errno = EINVAL;
    return -1;
}

type = drmGetMinorType(min);
if (type == -1)
    errno = ENODEV;
return type;

}
@evelikov-work

@XinfengZhang
Copy link
Contributor Author

suppose I will revert this patch to release 2.17.1 if no other concern.
also , I will analyze the issue further , because I could not reproduce the issue in my local side, may need some time to find the rootcause.

@evelikov
Copy link
Contributor

evelikov commented Mar 1, 2023

The function has been used by mesa for years, so I don't quite see how it causes a problem.

I'm not claiming it's perfect, but I don't see how. Can you provide some helpful information for me to debug?

@evelikov
Copy link
Contributor

evelikov commented Mar 1, 2023

Also note that maj != DRM_MAJOR has been replaced by drmNodeIsDRM() with version 2.4.96, back in 2018 - some 5 years ago. Which is available in Debian "oldstable" (buster) and Ubuntu 18.04 (bionic).

Which distribution is being affected?

@kumarsac
Copy link

kumarsac commented Mar 2, 2023

Here are some debug logs i put in libva and got this after revert of the patch.
by running in chrome OOPV(out of process video) tests.

Debug code. (I reverted the patch and added manually drmGetNodeTypeFromFd call to check the difference in return of both in this test scenario)

+    int node_type;
 
     if (fd < 0 || (is_render_nodes = VA_DRM_IsRenderNodeFd(fd)) < 0)
         return NULL;

+    printf("libva: vaGetDisplayDRM : VA_DRM_IsRenderNodeFd(%d) : %d\n",fd, is_render_nodes);
+    if (fd < 0 || (node_type = drmGetNodeTypeFromFd(fd)) < 0) {
+    }
+    printf("libva: vaGetDisplayDRM : drmGetNodeTypeFromFd(%d) : %d\n",fd, node_type);

output logs:
/var/log/ui/ui.20230302-044921:90:libva: vaGetDisplayDRM : VA_DRM_IsRenderNodeFd(14) : 1
/var/log/ui/ui.20230302-044921:92:libva: vaGetDisplayDRM : drmGetNodeTypeFromFd(14) : -1

both the function returns differently with same fd.
VA_DRM_IsRenderNodeFd return success as 1.
but drmGetNodeTypeFromFd return -1 for same fd.

@evelikov
Copy link
Contributor

evelikov commented Mar 2, 2023

@kumarsac are you the original reporter, or you're another person also seeing the issue? Can you help out with a few questions:

  • Distribution name and version - eg Ubuntu 18.04, or custom distribution based on Debian Bookworm
  • libdrm version and origin - eg provided by distro, or other (provide links to build recipe and source code)
  • i915 driver version and origin - eg. provided by distro with kernel v.1.2.3, or other (provide links to build recipe and source code)

In addition to that can you provide the output of:

  • stat /dev/dri/*
  • ls /sys/dev/char/*/device/drm

Thanks o/

@evelikov
Copy link
Contributor

evelikov commented Mar 2, 2023

Bear in mind the original VA_DRM_IsRenderNodeFd() is badly broken, since drmGetDeviceNameFromFd returns either NULL or the card node-name... while we assume it might return the render one.

@XinfengZhang
Copy link
Contributor Author

XinfengZhang commented Mar 3, 2023

hi @evelikov , it is a chromeOS usage, libdrm version is new one, suppose @kumarsac could provide the details.
but we debugged it yesterday, it is a permission problem
OOPV could call fstat with same fd , and return success
but could not call stat, the errno is EACCES
chrome stack is using sandbox , main gpu process should have all permission , we also print pid uid euid. an found they are same with main process initialization , but still failed. so, it should releated with permission setting in chromeOS.

yes, agree that drmGetDeviceNameFromFd will just return /dev/dri/cardx not render node. it is useless
maybe we just need:
if (fstat(fd, &st) == 0)
return S_ISCHR(st.st_mode) && ((st.st_rdev & 0xff) >> 6) == 2);

@kumarsac
Copy link

kumarsac commented Mar 3, 2023

libdrm chrome using is 2.4.110.
As @XinfengZhang mentioned its permission issue. (EPERM (-1) when we call stat(path, @SBUF )
in libva case its fstat with fd, (No issue with this).

code:

drmNodeIsDRM(int maj, int min) this api checks the path,
    char path[64];
    struct stat sbuf;

    snprintf(path, sizeof(path), "/sys/dev/char/%d:%d/device/drm",
             maj, min);
    return stat(path, &sbuf) == 0;

@evelikov
Copy link
Contributor

evelikov commented Mar 3, 2023

When you say chromeOS - which version is that? IIRC there are 3-4 variants available.

Mind you I'm not sandbox expert, so take the following with a pinch of salt.

Chromium applies different sandbox policy for GPU/Decode/Encode

If you follow all the use_XXX_specific_policies references across the codebase - you can deduce where and how, plus there is also per-vendor (AMD/Intel/etc) variance.

From a quick look these should be enabled for both AMD and Intel

Overall the various details are split quite drastically across the tree, so you might need more than just the above.

As a whole, I think most of *if not all the AMD/Intel/Virtio/ARM should be unified (modulo the driver-names), since they all work on the same standard interfaces and come from the same stack.

At the moment Intel is playing catch-up and as you can see it's costing your team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants