Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenXR - Basic integration for Meta Quest #577

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

lvonasek
Copy link

@lvonasek lvonasek commented Mar 10, 2024

Introduction

This Pr adds OpenXR support for Meta Quest (2,3,Pro). Using the smartphone Android version in the headset is very hard due to missing controllers support and relative mouse pointer. Intent of this PR is to add full controller support and render the screen using OpenXR (no stereo/6DoF).

How does it work

In the code is detection if it is running on a Meta/Oculus device. The OpenXR is initialized only if the detection indicates it runs on a XR headset. By that said, it means the same APK will be possible to use on mobile and XR. This is possible due to hybrid apps support (Hybrid app == part of the app could be 2D and another XR).

Instead of drawing on a screen, it is rendered into OpenGL framebuffer which is then transformed into a flat screen in XR space. Mouse cursor is still relative but it is mapped on controller translation which works perfectly even in games. Controller buttons are mapped to most common game keys.

Notes

  • Anyone who spams this discussion with unrelated stuff will be without warning blocked
  • The main intent is to support Linux programs inside of the headset
  • This addresses [Feature Request] OpenXR Implementation #564
  • The same approach was done in Winlator project (video)

@beef-ox
Copy link

beef-ox commented Mar 12, 2024

Can I help you with this project?

I am a programmer with very little Java experience, but I am a Linux power user and work as a Linux sysadmin; I also have a Quest 2

I also wanted to ask/suggest possibly leveraging wxrd (https://gitlab.freedesktop.org/xrdesktop/wxrd), a lightweight Linux VR compositor based on wlroots, or alternatively xrdesktop-gnome-shell, or xrdesktop-kdeplasma. Unsure if this helps or hinders your development, but these compositors provide a 3D VR environment with free-floating app windows, and controller support

@lvonasek
Copy link
Author

Can I help you with this project?

Let me make the basic integration working first. Currently it is just a black screen and it does nothing.
I will ping you once I have something working what could be improved.

I also wanted to ask/suggest possibly leveraging wxrd (https://gitlab.freedesktop.org/xrdesktop/wxrd), a lightweight Linux VR compositor based on wlroots, or alternatively xrdesktop-gnome-shell, or xrdesktop-kdeplasma. Unsure if this helps or hinders your development, but these compositors provide a 3D VR environment with free-floating app windows.

I didn't know about xrdesktop, it looks pretty wild. It would be ride to make it working on standalone. I imagine it quite challenging to made that working on Quest but I might be wrong.

@twaik
Copy link
Member

twaik commented Mar 12, 2024

I also wanted to ask/suggest possibly leveraging wxrd (https://gitlab.freedesktop.org/xrdesktop/wxrd), a lightweight Linux VR compositor based on wlroots

Termux:X11 is not related to Wayland.

@beef-ox
Copy link

beef-ox commented Mar 12, 2024

@twaik

I also wanted to ask/suggest possibly leveraging wxrd (https://gitlab.freedesktop.org/xrdesktop/wxrd), a lightweight Linux VR compositor based on wlroots

Termux:X11 is not related to Wayland.

It was my understanding that Termux:X11 is an xwayland session. Weston reportedly works, and that is Wayland-based.

@lvonasek
Further projects that may prove useful testing these efforts; mesa-zink with turnip and/or virglrenderer are termux-viable projects which enable hardware 3D acceleration

The xrdesktop project also has Gnome and KDE-specific builds that are x11-based (https://gitlab.freedesktop.org/xrdesktop). The wxrd window manager was created to have an extremely small footprint. In all the aforementioned cases, xrdesktop is the underlying platform, which already has movement tracking and controller support. (Hoping "Direct Input" option for touchscreen passthrough could work to pass the controllers and head tracking to Monado without much trouble)

@lvonasek
Copy link
Author

@beef-ox It is nice to see there are many opportunities. But until I have the basic integration working, I won't distract myself with other possible stuff. Key to success is to do small steps and do them properly.

@twaik
Copy link
Member

twaik commented Mar 12, 2024

It was my understanding that Termux:X11 is an xwayland session

It was first few years. Termux:X11 implemented a small subset of wayland protocol only to make it possible to run Xwayland. But at least year ago project dropped it because of architecture restrictions.

Weston reportedly works, and that is Wayland-based.

Weston works on top of X11 session. It does not need Wayland session to work, it starts wayland session.

The wxrd window manager was created to have an extremely small footprint

wxrd requires wlroots, which requires GLES with some extensions which can not be implemented on Android. Android vendors do not implement support for these extensions and even if they do they are not a part of SDK/NDK and not guaranteed to work.
It is a no go.

Hoping "Direct Input" option for touchscreen passthrough could work to pass the controllers and head tracking to Monado without much trouble

You have illusions about how that works. It is implemented only for touchscreen and passes only touchscreen events.

@twaik
Copy link
Member

twaik commented Mar 13, 2024

Termux:X11 does not use C++ to keep APK size as small as possible. Currently I am not intended to merge C++ code, only C.

@lvonasek
Copy link
Author

Termux:X11 does not use C++ to keep APK size as small as possible. Currently I am not intended to merge C++ code, only C.

Ok, good to know. I will move the XR code to C.

@twaik
Copy link
Member

twaik commented Mar 13, 2024

There are a few more things:

  1. You are using GLES3. Currently renderer uses GLES2, and I want to avoid mixing GLES versions in one project.
  2. It seems like you are considering to use swapchain, which will mean blitting image from one frame to another. This solution will have less performance than the main code. Do you receive Surface or SurfaceFrame from OpenXR which will be used to draw on it? I think the best solution will be taking this Surface of SurfaceFrame and pass it directly to X server. But I am not sure how exactly this works.
  3. I can add support for physical gamepad/controller/joystick, but I do not have device (gamepad) for tests. I do not play games that require that. But I can buy one in the case anyone sends funds to buy this (yeah, I buyed one from aliexpress, but it was a piece of ■■■■ and my devices did not even recognized its events correctly).

@lvonasek
Copy link
Author

  1. You are using GLES3. Currently renderer uses GLES2, and I want to avoid mixing GLES versions in one project.

I believe I can move to GLES2 completely. GLES3 would be needed if I use stereoscopical rendering using multiview extension.

  1. It seems like you are considering to use swapchain, which will mean blitting image from one frame to another. This solution will have less performance than the main code. Do you receive Surface or SurfaceFrame from OpenXR which will be used to draw on it? I think the best solution will be taking this Surface of SurfaceFrame and pass it directly to X server. But I am not sure how exactly this works.

The swapchain is required by OpenXR. The only way to render in OpenXR is to render into texture and then let headset reproject it. This architecture is very helpful in VR as you can get fluent experience even when not rendering lower framerate than headset's refresh rate. In 2D rendering it doens't bring much benefit but it still needs to be used.

  1. I can add support for physical gamepad/controller/joystick, but I do not have device (gamepad) for tests. I do not play games that require that. But I can buy one in the case anyone sends funds to buy this (yeah, I buyed one from aliexpress, but it was a piece of ■■■■ and my devices did not even recognized its events correctly).

I would like to avoid mapping Meta Quest touch controllers thumbsticks to joystick. The thumbsticks are getting after some time extreme noise. In other XR projects I check if the stick is 70% on right and if so then I send event to right key arrow. But of course we could make it optional at some point.

@twaik
Copy link
Member

twaik commented Mar 14, 2024

@lvonasek I am not really sure how exactly it works. How exactly you are intending to extract frames in the activity process? Currently LorieView (which works in context of MainActivity) does not output anything to Surface. It simply passes this Surface to X server process via Binder and X server (which works in com.termux's application sandbox) does all the magic. Of course you can use SurfaceTexture for this, but this solution will use more resources because X root window will be rendered one more time.

@lvonasek
Copy link
Author

First, I need to figure out how the rendering in this project works.

Ideally, I would call glBindFramebuffer (binding my XrFramebuffer) and render the frame using OpenGL into it. That way the frame is in OpenXR. In OpenXR, I'll define I want to render it on a plane in 3D space.

It is work in progress and I am new to this repo, please be patient if I commit or say something stupid.

@twaik
Copy link
Member

twaik commented Mar 14, 2024

First, I need to figure out how the rendering in this project works.

I explained where exactly Surface is being used so I can explain rendering process too.
Renderer is pretty much simple. Right after initialisation of itself X server starts renderer initialisation. It gets jmethodIDs of some Surface related functions, prepares GLDisplay and primitive GLContext and does some checks (to determine if device supports sampling AHardwareBuffer in RGBA and BGRA formats) and creates AHardwareBuffer for root window (if it is supported). After initialisation X server waits for Activity connection. When activity is connected it sends Surface and related data. Renderer initialises new GLContext based on this Surface (in ANativeWindow shape), creates shader and textures for root window and cursor and allows X server draw on there. When server wants to draw screen or cursor is being changed/moved renderer uses shader to draw both root window and cursor textures on current GLSurface and invokes eglSwapBuffers.
In the case if device supports sampling AHardwareBuffer of required type root window texture is created with eglGetNativeClientBufferANDROID+eglCreateImageKHR+glEGLImageTargetTexture2DOES, otherwise it is created with simple glTexImage2D and being updated with glTexSubImage2D.
Cursor texture is updated with glTexImage2D because I did not meet animated hi-res cursors. But that can be fixed.

Actually this process is pretty much simple. You can reimplement the whole thing in pure vulkan and integrate it to your OpenXR related code.

But. I am not sure why OpenXR context is initialized with JavaVM and global reference of Activity. So I am not sure if it can run completely in X server process. I think I will understand it better in the case you elaborate how exactly that works.

@lvonasek
Copy link
Author

I explained where exactly Surface is being used so I can explain rendering process too.
...

Thank you, this is very helpful.

But. I am not sure why OpenXR context is initialized with JavaVM and global reference of Activity. So I am not sure if it can run completely in X server process. I think I will understand it better in the case you elaborate how exactly that works.

For JavaVM I found nowhere any info why is it required. The activity itself is needed for app lifecycle (listening to onWindowFocusChange, onPause and onResume events). I try to elaborate but I am really not good at explaining:

AR/VR headsets have two app modes: 2D (Android apps flying in 3D space) and immersive OpenXR mode. In immersive mode the app cannot render anything using Android API. The only way to show something on screen is OpenGL/Vulkan. Meta recently added support for hybrid apps where you can switch between 2D and XR activity.

I added hybrid app support into this PR and trigger OpenXR runtime only if the app is running on a headset. The final APK will run on regular Android and XR headset(s). Currently it is under construction but in the future I would like to start XR only if the XServer is running (currently there is no way in the headset to go into the preferences or open help page).

  • XrEngine takes care if OpenXR initialization. it defines which extensions are used and starts the immersive mode.
  • XrRenderer is kind of 3D compositor. There are several layers in 3D space. The 3D space I am using is horizontally aligned with the floor level. The XrRenderer gets information about head`s motion tracking - XrPose (rotation using IMU sensor and relative position in meters using SLAM). All the magic is happening in the implementation of the headset. I just define, I want to render a geometrical shape with a specific dimensions in the space and map on it a texture in XR format (the mapping could be done separately for each eye).
  • XrFramebuffer in a very vague words connects OpenGL texture to OpenXR texture handle. In the beginning of frame I acquire the OpenXR texture handle, bind OpenGL framebuffer, render into it and release it. Once the OpenXR texture handle is released, I could pass it to XrRenderer and that will show it to the user (without calling eglSwapBuffers, no Surface is used). To be fair there exists xrCreateSwapchainAndroidSurfaceKHR but I never saw any project using it (I would give that a try if there is no better option).
  • XrInput provides status of the headset`s controllers (XrPose and buttons) and controls their vibrations.
  • XrMath are just math utils which every developer has to implement because OpenXR math structs doesn`t contain any basic math operations.

@beef-ox
Copy link

beef-ox commented Mar 15, 2024

@lvonasek

With all due respect, I would rather not lose the ability to render as a 2D app on the Quest's home launcher when the x server is displaying 2D content. There should be no need to do that.

x11 is a very important and well-understood protocol. If you want to implement Quest support, I don't think you should be creating a custom, made by you 3D environment to reproject onto, from which all further users of Termux:x11 will then be forced into using, over the Quest's multi-tasking launcher which lets you have 3 2D apps side by side; perfect for my programming workflow for example (and many others)

The goal of Tx11 should be to implement as much of the x11 client protocol as possible, and as close to spec in all respects as possible. The distinction as to whether it should attempt 2D mode vs immersive mode should not be reliant upon the device it is on, but upon whether the x server is attempting to display OpenXR content AND the hardware supports it.

I 100% agree, if the Linux environment is trying to output stereoscopic content over x11, this should indeed display it in immersive mode, but if not, it should display it as a 2D app window. Ideally, this could work like full screen, where the rendering pipeline is direct vs going through a compositor. 2D content displays in a traditional desktop "display" as a 2D app within a WM/DE, but attempting to display XR content would switch to immersive mode to display that content.

@lvonasek
Copy link
Author

With all due respect, I would rather not lose the ability to render as a 2D app on the Quest's home launcher when the x server is displaying 2D content. There should be no need to do that.

I will definitely try to make that optional.

@lvonasek
Copy link
Author

Hm, I do not see the comments anywhere

@twaik
Copy link
Member

twaik commented Apr 17, 2024

image
...

@twaik
Copy link
Member

twaik commented Apr 17, 2024

Also I still do not understand why XrEngineInit needs jvm and activity pointers. What happens if you call it with activity = NULL and activity = NULL && vm = NULL?

@lvonasek
Copy link
Author

Thank you for the screenshot. It seems like a GitHub issue not showing it on my side.

  1. EditText
    It is the only way I managed to get the software keyboard output. The system keyboard on Quest doesn't fulfill the Android API.

  2. Brackets
    Right, I will fix the formatting

  3. JVM and activity pointers
    If I remember correctly the call returned XR_ERROR_HANDLE_INVALID. The docs are not specifying it:
    https://registry.khronos.org/OpenXR/specs/1.0/man/html/XrLoaderInitInfoAndroidKHR.html

@twaik
Copy link
Member

twaik commented Apr 17, 2024

  • It is the only way I managed to get the software keyboard output. The system keyboard on Quest doesn't fulfill the Android API.

And I told you what exactly you can do to investigate why IME does not show up.

If I remember correctly the call returned XR_ERROR_HANDLE_INVALID. The docs are not specifying it:

Ok, so both must be valid pointers. JVM pointer is not a problem for termux-x11 X server process, but Activity can be. Can you please investigate what functions it invokes?
You can use ContextWrapper for this, just do not attach real context.

@twaik
Copy link
Member

twaik commented Apr 17, 2024

In the case if we can fake or proxy the functions needed by OpenXR we can move rendering code to X server process completely. And maybe input related code too. And this will improve performance.

@lvonasek
Copy link
Author

  • It is the only way I managed to get the software keyboard output. The system keyboard on Quest doesn't fulfill the Android API.

And I told you what exactly you can do to investigate why IME does not show up.

The problem is not with showing the IME, the problem is reading the characters as the software keyboard writes it directly into UI and not calling key events.

If I remember correctly the call returned XR_ERROR_HANDLE_INVALID. The docs are not specifying it:

Ok, so both must be valid pointers. JVM pointer is not a problem for termux-x11 X server process, but Activity can be. Can you please investigate what functions it invokes? You can use ContextWrapper for this, just do not attach real context.

I will give it a try.

@twaik
Copy link
Member

twaik commented Apr 17, 2024

The problem is not with showing the IME, the problem is reading the characters as the software keyboard writes it directly into UI and not calling key events.

And again, we will not be able to reproduce normal behaviour until you make a minimal viable TextView-based view which can receive input normally.

My guess is that Quest's IME simply sends characters through InputConnection but we will not figure it out without testing on real device...

@lvonasek
Copy link
Author

I looked into the code of EditText and TextView but it contains a lot of internal classes. I didnt find enough time to put it everything into the project and to figure out what parts are needed.

Another option would be to use this extension https://developer.oculus.com/documentation/native/android/mobile-openxr-virtual-keyboard-sample/ . It would give more control over the keyboard. Unfortunatelly, it doesn`t support pasting from clipboard which is at least from my point of view a no-go. I will invest more into the EditText use but it might take longer.


I tried mocking the Context using ContextWrapper. In the beginning it required getAssets, getClassLoader, createPackageContext, getApplicationInfo and about ten others but then it throwed some exception from com.oculus.vrapi.PackageValidator which I wasnt able to satisfy (I dont have the relevant stacktrace because as it was called from JNI).

@twaik
Copy link
Member

twaik commented Apr 23, 2024

My guess is that Quest's IME simply sends characters through InputConnection

...

@lvonasek
Copy link
Author

I tested InputConnection but didn't have any success with it. It seems Quest is using another way.

@twaik
Copy link
Member

twaik commented Apr 25, 2024

You always can put a breakpoint in your input handling code and explore the stacktrace to find the code that triggers it. Probably it will be better to publish the whole stacktrace here.

@twaik
Copy link
Member

twaik commented Apr 25, 2024

Or you can put Log.e("stacktrace", "text", new Throwable()); to input handling code to put the stacktrace to the log directly.

@lvonasek
Copy link
Author

ok, it is using InputConnection. I must be doing something wrong:

java.lang.Throwable
    at com.termux.x11.XrActivity.afterTextChanged(XrActivity.java:205)
    at android.widget.TextView.sendAfterTextChanged(TextView.java:10805)
    at android.widget.TextView$ChangeWatcher.afterTextChanged(TextView.java:13820)
    at android.text.SpannableStringBuilder.sendAfterTextChanged(SpannableStringBuilder.java:1278)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:578)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:508)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:38)
    at android.view.inputmethod.BaseInputConnection.replaceText(BaseInputConnection.java:941)
    at android.view.inputmethod.BaseInputConnection.setComposingText(BaseInputConnection.java:712)
    at com.android.internal.view.IInputConnectionWrapper.executeMessage(IInputConnectionWrapper.java:633)
    at com.android.internal.view.IInputConnectionWrapper$MyHandler.handleMessage(IInputConnectionWrapper.java:111)
    at android.os.Handler.dispatchMessage(Handler.java:106)
    at android.os.Looper.loopOnce(Looper.java:214)
    at android.os.Looper.loop(Looper.java:304)
    at android.app.ActivityThread.main(ActivityThread.java:7918)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1010)

@twaik
Copy link
Member

twaik commented Apr 25, 2024

Probably onCreateInputConnection can work without EditText so you can move it to LorieView and get rid of EditText completely.

@twaik
Copy link
Member

twaik commented Apr 25, 2024

I think you should try to replace extending EditText with extending with View (and making it focusable and focusable in touch mode like LorieView) and if it works you can make LorieView return your InputConnection implementation in the case if listener is set.

@lvonasek
Copy link
Author

I would like to keep it out of LorieView to keep XR and Android behaviour separated.

There is GLSurfaceView used, I prefer to create XrSurfaceView and put there the custom InputConnection and other XR specific stuff.

@ewt45
Copy link

ewt45 commented Apr 26, 2024

termux-app also use oncreateinputconnection with custon view. matbe you can take a reference
https://github.com/termux/termux-app/blob/2f40df91e54662190befe3b981595209944348e8/terminal-view/src/main/java/com/termux/view/TerminalView.java#L269

@twaik
Copy link
Member

twaik commented Apr 26, 2024

termux-app also use oncreateinputconnection

termux-app works with keyboard pretty much different way...

@lvonasek
Copy link
Author

Just to update: I am struggling with the removal of EditText dependency from XrKeyboard. I have to concentrate the whole May on other projects, I will revisit this afterwards.

@lvonasek
Copy link
Author

Working on the keyboard support somehow killed my motivation to continue on this project. Not sure if I find energy to come back to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants