Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VA_DRM_GetNumCandidates changed behaviour, therefore aborts X server in VM with QXL driver. #700

Open
bernhardu opened this issue Mar 19, 2023 · 6 comments

Comments

@bernhardu
Copy link

In current Debian Testing running inside a qemu VM with the QML driver the X server can be crashed by just running gst-play-1.0 no-such-file.avi.

As far as I see VA_DRM_GetNumCandidates does compare the driver name "qxl" against a known list of drivers g_driver_name_map.
In my debugging this list contains "qxl" neither in libva version 2.17.0-1 (Debian Bookworm/testing)
nor in 2.10.0-1 (Debian Bullseye/stable).

In version 2.10.0 VA_DRM_GetNumCandidates returns VA_STATUS_ERROR_UNKNOWN (because count==0).
But in version 2.17.0 it returns unconditionally VA_STATUS_SUCCESS.

It look like this got introduced with commit 317c0fb.

Therefore va_DisplayContextGetNumCandidates does not return early but does enter va_drm_authenticate/VA_DRI2Authenticate which finally causes the X server to hit an assertion and ends therefore.
Because I guess the X server should be resistant even for such queries
I opened https://gitlab.freedesktop.org/xorg/xserver/-/issues/1534.

But libva probably want to either revert back to original behaviour of VA_DRM_GetNumCandidates,
or wants to change users of it to check for the returned num_candidates > 0 or something like this?

libva/va/drm/va_drm.c

Lines 54 to 56 in 0fa448d

status = VA_DRM_GetNumCandidates(ctx, num_candidates);
if (status != VA_STATUS_SUCCESS)
return status;

(gdb) bt
#0  VA_DRI2Authenticate (dpy=dpy@entry=0x555555625aa0, window=1338, magic=magic@entry=1) at ./va/x11/va_dri2.c:206
#1  0x00007ffff70b0722 in va_drm_authenticate_x11 (fd=fd@entry=6, magic=magic@entry=1) at ./va/drm/va_drm_auth_x11.c:140
#2  0x00007ffff70b041f in va_drm_authenticate (fd=6, magic=1) at ./va/drm/va_drm_auth.c:37
#3  0x00007ffff70b02df in va_DisplayContextGetNumCandidates (pDisplayContext=<optimized out>, num_candidates=<optimized out>) at ./va/drm/va_drm.c:73
#4  0x00007ffff725cedb in vaInitialize () from /lib/x86_64-linux-gnu/libva.so.2
#5  0x00007ffff72fa878 in vaapi_initialize (dpy=dpy@entry=0x555555624e10) at ../gst-libs/gst/vaapi/gstvaapiutils.c:113
#6  0x00007ffff7322894 in supports_vaapi (fd=6) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:77
#7  get_default_device_path (display=0x5555556221d0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:140
#8  set_device_path (device_path=<optimized out>, display=0x5555556221d0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:181
#9  gst_vaapi_display_drm_open_display (display=0x5555556221d0, name=<optimized out>) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:247
#10 0x00007ffff72ef544 in gst_vaapi_display_create (data=0x0, init_type=GST_VAAPI_DISPLAY_INIT_FROM_DISPLAY_NAME, display=0x5555556221d0) at ../gst-libs/gst/vaapi/gstvaapidisplay.c:965
#11 gst_vaapi_display_config (display=0x5555556221d0, init_type=init_type@entry=GST_VAAPI_DISPLAY_INIT_FROM_DISPLAY_NAME, init_value=init_value@entry=0x0) at ../gst-libs/gst/vaapi/gstvaapidisplay.c:1272
#12 0x00007ffff7322b40 in gst_vaapi_display_drm_new (device_path=0x0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:367
#13 0x00007ffff72b96d0 in gst_vaapi_create_test_display () at ../gst/vaapi/gstvaapipluginutil.c:929
#14 0x00007ffff72b0899 in plugin_init (plugin=0x5555555ea030) at ../gst/vaapi/gstvaapi.c:191
#15 0x00007ffff7f041a2 in gst_plugin_register_func (plugin=plugin@entry=0x5555555ea030, desc=desc@entry=0x7ffff736f680 <gst_plugin_desc>, user_data=user_data@entry=0x0) at ../gst/gstplugin.c:532
#16 0x00007ffff7f062ef in _priv_gst_plugin_load_file_for_registry (filename=filename@entry=0x555555562bcc "/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvaapi.so", registry=0x55555556d960, registry@entry=0x0, error=error@entry=0x0) at ../gst/gstplugin.c:971
#17 0x00007ffff7f06baa in gst_plugin_load_file (filename=filename@entry=0x555555562bcc "/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvaapi.so", error=error@entry=0x0) at ../gst/gstplugin.c:689
#18 0x00007ffff7f093f2 in do_plugin_load (tag=12, filename=0x555555562bcc "/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvaapi.so", l=0x55555556cc30) at ../gst/gstpluginloader.c:848
#19 handle_rx_packet (payload_len=<optimized out>, payload=0x555555562bcc "/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvaapi.so", tag=12, pack_type=<optimized out>, l=0x55555556cc30) at ../gst/gstpluginloader.c:956
#20 read_one (l=0x55555556cc30) at ../gst/gstpluginloader.c:1132
#21 exchange_packets (l=l@entry=0x55555556cc30) at ../gst/gstpluginloader.c:1160
#22 0x00007ffff7f0a708 in _gst_plugin_loader_client_run () at ../gst/gstpluginloader.c:700
#23 0x0000555555555165 in main (argc=<optimized out>, argv=0x7fffffffe3c8) at ../libs/gst/helpers/gst-plugin-scanner.c:67
@grover92000
Copy link

Do you have gstreamer-vaapi installed in your qemu VM?
I've seen this starting with libva-2.16 but it only happens in a VM and only when gstreamer-vaapi is installed.
You can patch libva or xorg-server to work around it but I think the underlying issue is in gstreamer-vaapi.

@bernhardu
Copy link
Author

Do you have gstreamer-vaapi installed in your qemu VM?

Sure I have, all lines in the backtrace with "gst-libs/gst/vaapi" are from gstreamer-vaapi.

I've seen this starting with libva-2.16 but it only happens in a VM and only when gstreamer-vaapi is installed.

I have only tested the versions currently in Debian stable and testing but yes, the commit 317c0fb is already contained in 2.16.0.

You can patch libva or xorg-server to work around it

I reported this here, because it comes to a surprise to be able to crash the whole X server by that simple command giving a file that does not even exist. And users might even have trouble to identify that this is caused by/with gstreamer-vaapi, which might get installed just by dependencies/recommends, or the VM got upgraded from an older libva where this was a working/not-crashing setup.

but I think the underlying issue is in gstreamer-vaapi.

Do you have details on this - as far as I see the behaviour change was in libva/VA_DRM_GetNumCandidates, so this might have happened on purpose, or by accident and might be though about again, as it was not failing before.
I would go the other direction, the X server should be also robust enough to survive that request, but It might be easy for libva also not to call the va_drm_authenticate request to the X server (like it was before 2.16) by doing something like this:

- if (status != VA_STATUS_SUCCESS) 
-      return status;
+ if (status != VA_STATUS_SUCCESS || num_candidates <= 0) 
+      return VA_STATUS_ERROR_UNKNOWN; 

@grover92000
Copy link

but I think the underlying issue is in gstreamer-vaapi.

Do you have details on this - as far as I see the behaviour change was in libva/VA_DRM_GetNumCandidates, so this might have happened on purpose, or by accident and might be though about again, as it was not failing before.

The details are from my testing. What I've found is the issue only exists in a VM when gstreamer-vaapi is installed as I've already stated.

gstreamer-vaapi is a plugin for gstreamer so no package should have a dependency on it and can be safely removed. gstreamer itself doesn't depend on it.

If you can provide a test case where libva causes the crash of X without gstreamer-vaapi installed then please do because I can't find one.

@bernhardu
Copy link
Author

Thanks for looking into it.

The details are from my testing. What I've found is the issue only exists in a VM when gstreamer-vaapi is installed as I've already stated.

gstreamer-vaapi is a plugin for gstreamer so no package should have a dependency on it and can be safely removed. gstreamer itself doesn't depend on it.

You are right, I have not found many dependencies, there just two packages in current Debian testing, one depends, one suggests:

# apt-cache rdepends gstreamer1.0-vaapi 
gstreamer1.0-vaapi
Reverse Depends:
  gstreamer1.0-vaapi-dbgsym
  nheko
  vokoscreen-ng

If you can provide a test case where libva causes the crash of X without gstreamer-vaapi installed then please do because I can't find one.

Following minimal test case is able to break a running X server without any gstreamer package installed:

  • QEmu started with -vga qxl -spice addr=$LOCALIP,port=5930,disable-ticketing=on
  • Watching the terminal of the VM via remote-viewer spice://$LOCALIP:5930
  • Install apt install xdm xserver-xorg jwm xterm libva-dev build-essential and reboot
  • Login to the X session
  • Build test with gcc -O0 -g test-vaInitialize.c -lva -lva-drm -o test-vaInitialize
  • Execute test DISPLAY=:0 ./test-vaInitialize

With test-vaInitialize.c having following content:

/*
gcc -O0 -g test-vaInitialize.c -lva -lva-drm -o test-vaInitialize
*/
#include <fcntl.h>
#include <va/va_drm.h>

int main()
{
  int major_version, minor_version;
  VADisplay va_dpy;
  VAStatus status;
  int fd;

  fd = open ("/dev/dri/card0", O_RDWR | O_CLOEXEC);
  va_dpy = vaGetDisplayDRM (fd);
  status = vaInitialize (va_dpy, &major_version, &minor_version);
}

I wondered how vainfo handles this and found it simply tries in this order wayland, x11 and drm.
Because the x11 path is already successful it takes that.
But I found getting X to crash is also possible with just vainfo in this way:

DISPLAY=:0 vainfo --display drm

@grover92000
Copy link

Thanks. I was able to reproduce the crash. Patching VA_DRM_GetNumCandidates() so that it returns the status it did before the 2.16.0 update should do the trick:

diff -Nuarp libva-2.18.0.orig/va/drm/va_drm_utils.c libva-2.18.0/va/drm/va_drm_utils.c
--- libva-2.18.0.orig/va/drm/va_drm_utils.c     2023-03-19 07:50:33.000000000 -0400
+++ libva-2.18.0/va/drm/va_drm_utils.c  2023-03-28 14:55:26.399408731 -0400
@@ -87,15 +87,8 @@ VA_DRM_GetNumCandidates(VADriverContextP

     free(driver_name);

-    /*
-     * If the drm driver name does not have a mapped vaapi driver name, then
-     * assume they have the same name.
-     */
-    if (count == 0)
-        count = 1;
-
     *num_candidates = count;
-    return VA_STATUS_SUCCESS;
+    return count ? VA_STATUS_SUCCESS : VA_STATUS_ERROR_UNKNOWN;
 }

 /* Returns the VA driver name for the active display */

Tested with the virtio and bochs (std) drivers in qemu.

@evelikov
Copy link
Contributor

As you can see from the upstream Xorg server/driver issue it's a bug on their end. It has been known for a while but seemingly nobody had the time/interest to fix that. Most importantly on can trigger that bug even without libva.

So I suggest closing this issue and checking with colleagues, companies or just nerd sniping (not it) to fix that for you ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants