Skip to content

egl dri

Bjorn Stahl edited this page Mar 23, 2021 · 8 revisions

The VIDEO_PLATFORM=egl-dri on Linux and BSDs is special in many ways, and the complexity in setting it up varies a lot on your wants and needs.

There are two configuration mechanisms in play. One is to use the database via the arcan_db tool to add key/value pairs to the 'arcan' appl, like:

arcan_db add_appl_kv arcan video_device /dev/dri/card1

the other is to use environment variables, like:

export ARCAN_VIDEO_DEVICE=/dev/dri/card1

If the environment variable for a specific config key is set, it will take precedence over a possible database entry. You can run Arcan without any arguments to see a list of configuration keys that the current build supports.

Some worthwhile options here are as follows:

arcan_video_device_force_compose

When present, this will disable any heuristics/attempts to use direct FBO scanout. This might clear up graphical glitches on some display/GPU combinations.

arcan_video_device_nodpms

This disables any power management controls.

arcan_video_device_wait

This idle waits at startup until a GPU with a valid output is plugged in. The core problem this is a temporary workaround for is that the output display properties influence a lot of how the rest of resource allocation and rendering will behave, and starting without any display available at all is problematic. The 'headless' arcan build works from the other angle (when you only ever have a virtual output) and the two will eventually be married into one.

arcan_video_ignore_dirty

This disables dirty update tracking and forces a recompositions/refresh on every possible frame. This is an expensive but useful troubleshooting tool to figure out if there is an engine or driver problem with dirty region propagation.

Multi-GPU configuration

Supporting composition on multiple GPUs is currently in-development / experimental / incomplete, and the "easier" solution is simply to run an arcan instance per GPU and migrate windows between the two, with some latched input binding that toggles input on/off between the two instances. Far from ideal.

The configuration for how multiple GPU devices should be handled otherwise is complex. For some you might need to set:

arcan_video_display_device=/path/to/drm/cardN (N = 0,1,2,...)
arcan_video_draw_device=/path/to/drm/cardN

When the hardware requires different device entries for displaying versus rendering which might occur in some embedded and laptop cases.

Individual GPU options are suffixed with a number for secondary, tertiary and so on:

arcan_video_device_buffer_2=streams

For configurations where you want to toggle back and forth between two GPUs, you can mark one as an 'alternate':

arcan_video_device_alternate=/dev/drm/cardN

This requires some support from the WM, in durden the displays/reset menus path will perform the necessary toggle.

Nvidia-binary specific details.

First, since it is built around the drm/kms infrastructure it requires a recent kernel (4.4+), a Mesa/libdrm build that exposes EGL and GL21 with support for your graphics card and preferably also render-nodes support (for arcan_lwa and accelerated game frameserver). These prerequisites are similar to what Wayland composers like Weston has. MAKE SURE that you have AT LEAST one /dev/dri/card node and hopefully also matching renderD nodes. Otherwise you need to fix your kernel/driver situation first or you won't get anywhere.

Second, permissions and devices. By default, the engine will just grab the first available /dev/dri/cardN entry, and if your preferences differ, set the corresponding config entry (video_device). You will need permissions on that device node (some distributions map this to graphics) and possibly root/capability to become drmMaster (don't ask...). The reason why this isn't easier/more configurable right now is due to engine refactoring to support multi-GPU and GPU hotplugging. Note that some device node creation setups will not give you a deterministic allocation for cardN with multiple GPUs, fantastic. There has also been some reported success by having the cardN in the group arcan will be run as, and having the renderD nodes in a group that untrusted clients can access.

IF you SUID the binary, it will perform the drmMaster dance immediately on the devices that it would try and open on a normal setup - and thereafter drop privileges.

The corresponding (linux) event backend deliberately avoids udev. Either give the user arcan is running as access to the /dev/input set of nodes (usually by being added to the input group) or have the udev setup generate a suitable folder of nodes, and refer to it using the event_scandir config key.

The backend will use inotify to monitor that folder for new entries and try to probe / take control over those nodes when that happen. Note that we also need to have permission to run inotify on the folder for that to work. Restricting this to a whitelist is a good move in the days of devices like rubberDucky, but also because there is a ton of terrible input devices out there that generates many kHz of bad/broken samples that add a lot of invisible processing overhead.

Keyboard maps, translation and other external notification systems are not part of the engine but rather of the running appl, though there is a support script (builtin/keyboard.lua) that most appls tend to use, and this script has some basic map format. If the engine is built with XKB support, it will try and use XKB to translate whenever possible, assuming the related environment variables (XKB_DEFAULT_LAYOUT, XKB_DEFAULT_VARIANT, XKB_DEFAULT_MODEL) are set.

Note, currently, the synchronization strategies in this platform are rather primitive. To handle multiple displays tear-free, we currently let the VSYNC of the primary display determine buffer swap and then let other displays update and synch when ready. This means that realtime content like games and videos may be less smooth on displays other than the primary one. This limit comes from the lack of lower level APIs and driver support. The features that will eventually fix this problems are 'atomic modeset' and 'synchronization fences', but both are very much works in progress.

Note, another concession is that although there is support for virtual terminal switching, it can (and should) be disabled, there are just too many race conditions in every layer available for this to work reliably for everyone and every occasion. Reasons for that is the underlying interface being complete shit and has a number of side effects that may or may not be relevant. Most engine features are in place to support multiple parallel sessions, or, in XDG terminology "seats" but we considered it a bad design and a bad idea, that will receive considerably less favour, attention and priority than important use-cases (reliable suspend resume, low energy consumption, ...).

Note, the platform currently doesn't do any display connector hotplug detection (why this isn't provided through the normal device and ioctls but rather resort to sysfs scraping is surprising at the least, there might be valid reasons hiding in the drivers) and relies on an explicit rescan called from the scripting layer -- which can stall the graphics pipeline for several hundred milliseconds. Durden, for instance, permits rescan commands over the command channel named pipe -- hook that up to some other event layer and you're there. The problem is that there seem to be quite a few hard-to-catch race conditions (we are talking kernel crashes) from rapid and wild plug/unplug while-scanning operations.

Note, drm/kms hardware support varies wildly, with a lot of instabilities directly related to the running kernel version etc. Concurrent use of multiple GPUs from the same or even from different vendors rarely works, but it is a priority.

Note, if arcan is running incredibly sluggish and taking up high CPU use, check so that you are not accidentally running Mesa with the llvmpipe / software fallback. In many distributions, the packages are actually split up, with individual mesa packages for each GPU driver. In void linux, for instance, if you install mesa but forget mesa-intel-dri (assuming an intel GPU), things may very well work, but with terrible performance.

Note, getting rid of X dependencies MesaGL and libdrm packaging typically come with dependencies to the entire X- set of libraries and everything that entails. It is possible, however, to build them without (assuming you don't want to use any X compatibility layers either). Some reported success have been by cloning and building separately with arguments to MesaGL configure (add gallium drivers to fit your hardware):

./configure --enable-gles2 --enable-gles1 --disable-glx --enable-egl --enable-gallium-egl --with-gallium-drivers=nouveau,i915,radeonsi,swrast --enable-gallium-osmesa --with-egl-platforms=drm

In addition, the OPENGL_gl_LIBRARY in CMakeCache should point to libOSMesa.so and this is not always detected by the find scripts.

NVidia-binary specifics

There is experimental support for running recent versions of the nvidia binary drivers, but expect it to have quite some flaws when it comes to resolution switching, multi-displays, synchronization/timing and hot-plugging for a while. The normal Xorg- like restrictions apply, such as making sure there's no conflict with the Mesa-GL implementation and that the nouveau driver is blacklisted.

You will need to load the nvidia-drm module with the modeset=1 argument and set the device_buffer config key to streams as there is currently no working auto-probing.

modprobe -r nvidia-drm ; modprobe nvidia-drm modeset=1

You will also need to force a specific device and buffer path at the moment. Set the command-line environment:

ARCAN_VIDEO_DEVICE=/dev/dri/card0 ARCAN_VIDEO_DEVICE_BUFFER=streams

before invoking arcan so that it does not try to probe the GBM path, as that might 'work' but incredibly slowly due to the presence of gbm/mesa/nouveau.

If no screen appears or the framerate is extremely low, chances are that there is a conflict between the linux provided framebuffer (efifb, vesafb) and that you need to both disable those in your kernel command-line and set your bios to legacy mode (i.e. no uefi framebuffer). It ties into a complicated issue with how there is no decent handover from bios to bootloader to OS, and OS might spin a compatibility framebuffer that fights the drivers.

Accelerated clients won't get to use handle/buffer passing either at the moment, we're still missing code in shmif/egl-dri to register as a "EGL External Platform interface" provider (seems like a decent workaround to the EGL spec though).