Skip to content

Engine Input

bjornstahl edited this page Sep 11, 2017 · 1 revision

This page covers notes on input management in respect to general use, accessibility and internationalization. The opinion is that off all problem areas that we attempt to cover, in spite of the current mess that is audio and graphics, input is even worse off.

Although a lot of A/V related problems boil down to terrible APIs and purposefully disgusting drivers, the input problem and solution stem from a long legacy of stacking additional translation tables on top of a long chain of preexisting ones. Scancode to Keycode to Keysym to XSym to SDLSym back to XSym. Do we have remapping in any of these layers? What is the current keyboard layout, should the layout be allowed to switch, is the translation tables mapped in kernel? in input layer of display server? is there an IME for more complex language input in effect? what about diacretric / triacretic states, are they tracked per connection? do we have multiple different keyboards attached? What is the encoding scheme in effect and so on. On top of that, there's the other legacy angle in ancilliary services (VNC, RDP, Terminals) and the entire facepalm that is modern day Unicode.

This scenario was just for keyboards, add high variability in sample rate (from a few Hz to a few thousand), noisy A/D converters in gamepads, devices being plugged in and plugged out ten times a second from cable issues. Calibration and latency in touch sensors. Compensating for drift in sensors etc. The higher level of abstraction you implement some of these stages on, the heavier the punishment (just look at the garbage collection workarounds hiding in android on the matter). The final approach heavily affects end-user quality of experience and a failure can render a device unusable.

Just consider the scenario where a developer hooked up mouse input to drag resize of a window in a naive way:

1. On Mouse Down:
   If the cursor is at a drag handle (border or icon), mark state.

2. On Motion:
   If we are in drag resize state and the cursor has moved,
   resize (or request that a client resizes) the output buffer
   and redraw the contents of the window.

CPU load, power consumption and how "smooth" it will feel will here vary based on how large movements the user has to make because of body mechanics and the sample rate of the input device. At a possible (device sample rate) resize requests a second where each resize requires buffer renegotiation, redrawing, changing layout etc. can quickly saturate processing and have detrimental effect on other activities.

On top of this we have the exotic cases (accessibility) where assumptions about number of fingers or limbs, stability of motion, ability to maintain pressure and so on are involved -- along with the devices intended to account for that, which will also vary based on context. Eyetracking or brain-machine interfaces?

Unfortunately, we lack the definitive answers for many of these questions and have to deal with some portable reality. Recall the event propagation from the Engine Design: event loop

Arcan uses the following approach:

  1. Input event model attempts to aggregate as much information as is available, this includes analog sample values, digital button presses, coordinate system motion and translated device hints (keysym, scancodes, attempted translation, higher-level á unicode). This actual device node connections are platform specific, but we favor a whitelisting and explicit 'monitor a single direction' rather than policy+library+vfs parsing approach of udev. How the monitored folder is managed is a detail for the distributions.

  2. Input platform is statically selected at build time when video platform and other system- specific properties are configured. These vary in ability and maturity, with SDL being the first one that was used and we are therefore somewhat biased towards their tables for legacy reasons. The platform maps data to the model specified in 1.

  3. Frameservers can, under special conditions, be allowed to enqueue events that match the model specified in 1 if the appl explicitly enables that mask for queue transfer. This is not a default behavior for the strong security implications this has.

  4. The input platform exposes filtering mechanisms for certain devices, the appl can use this to define filter kernels and functions to reduce the number of events from noisy sources. It is also possible to toggle sampling of some devices on and off.

  5. The event- queue packs received events and exposes to the Lua VM in a table packed so that it also knows how to unpack later, with the addition of a higher-level label tag for specialized applications. Shmif supports reporting custom labels, to assist with keybinding.

  6. The appl has an entry point where the option is to select support scripts to cater to specific needs, calibration, higher-level concepts (gestures), these manipulate the received table. The target_input function can then be used to explicitly forward this table (this is where the higher-level hint, can be added). The keysym.lua support-script takes care of specific state tracking, utf-8 mapping and keysym/keycode translation tables. OS specific keymaps that are needed should be translated offline to such lua scripts.

  7. A connected client or frameserver will never see, receive or directly work with the set of devices that are involved with input -- it should be- and remain- completely opaque in order to stand a fighting chance against a number of possible and subtle timing side-channel attacks.

  8. There are specialized clients that use the shmif and can be launched from within the running appl explicitly. This is used to outsource LED management, and to enable sensor-fusion style complex devices like head-mounted displays.

Should the event queues for a target be saturated, there are calls available for completely ignoring specific devices and analog events will be dropped in favor of discrete events which will be dropped in favor of system events.

To aid this process, there are also support scripts for hooking, recording and replaying events in the Lua layer and in the engine event layer. These features are re-used to test more complex applications and interactions (see the Debugging Facilities page).