Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow adjustment of display time #1328

Open
cabanier opened this issue May 30, 2023 · 4 comments
Open

Allow adjustment of display time #1328

cabanier opened this issue May 30, 2023 · 4 comments

Comments

@cabanier
Copy link
Member

WebXR reports the display time that describes when the system expects a frame to be displayed. If the frame take longer to calculate, the system will apply time warp to give the appearance of smoothness (aka reprojection/timewarp).

Recently, a developer that creates a streaming server based game, reported that they also need to do timewarp because it takes a while for the game video stream to transmit to the headset. It occured to me that if they would be able to change the predicted display time to the time the frame was rendered on the server, the system compositor could do this timewarp for them.
It seems that making the display time writeable would accomplish this.

/agenda Allow adjustment of display time

@probot-label probot-label bot added the agenda Request discussion in the next telecon/FTF label May 30, 2023
@KooIaIa
Copy link

KooIaIa commented May 30, 2023

I have also wanted to stream PCVR into WebXR and this would be great! Especially with system-level Spacewarp! Imagine in the future when XR compositors use Depth as the standard and how powerful this could be - streaming could be Depth + Time (maybe even CSS animation motion vectors!). It makes VR much more declarative like 3D CSS where content is declared in Time and Space. Could this be done with an XR Stereo-Reprojection Depth-Layer or does this take over the entire websites WebXR time system?

If Time is used in this proposed way, does it only become possible to work in a single Time? My use case is this:

A Streaming-Based game doesn't need to take over 100% of your system - yet this is how all XR software works today. Games are designed like Atari video game consoles to take 100% of system resources and not interoperate like how the Web and Computers interoperates with all parts of our lives fluidly outside XR today.

My use case is that I want to stream a game into WebXR from PCVR - but I want to keep using the 80% of my webpage's system resources that isn't needed to perform streaming and reprojection. (WebGPU makes this now even more desirable) Thus, I want to stream in a Game from time X but still be able to load in Web / WebXR content at my normal time Y.

With a WebXR Layer that is composited at Time X with Position X and Depth X - is it still possible to work in a different Time Y?

If the current proposal applies to my entire websitge - to do what I am describing I think it will require adding a delay to all my XR input values from the user on the headset to make them lagged in time to the now adjusted display time. Then I draw my Web content lagged on top. This does save on system composition resources. Ideally though, I don't think you should need to slow the whole system down for something that isn't even using the system. In the future, I hope XR systems let apps interoperate and exist in multiple Times, Refresh Rates, and Reprojection methods like Depth Layers. This is even more true in AR where Reality is the main "display time" - in AR you don't want to have to slow down reality-time to match streaming-time!

If this proposal could be scoped such that 1 WebXR Frame is at this time (streaming with this proposal) - and another WebXR Frame is at real-time (calculating and being awesome) - and they can both have their own display time and display together in the same web page it would be awesome. I would love to make a demo of this.

@KooIaIa
Copy link

KooIaIa commented May 30, 2023

From the meeting:

To do projection need the following data:

  1. Position at input
  2. View rendered at that position

Projection = Old pos + Server view projected to current 'display time' or system 'real' time

Could this be done with a callback in Javascript? Like a async WebXR request that returns a frame from a server with the pos that generated it.

Async run_WebXR ( async_pos WebXR object) {

returns [new_view, async_pos] }

Question:

When the WebXR Async object is returned could it be rendered as an Async WebXR Layer along with other WebXR content?

Reasons:

Streaming content taking over the entire system is what I am trying to avoid. OpenXR patterns from 2016 shouldn't hold back what WebXR is capable of in the future.

Alternative Proposal:

WebXR shouldn't do this if it is going to take over the whole system. Instead, this would have to be done by the user with an api like WebGPU. If a user created their own reprojection system in WebGPU like Spacewarp with Pos + View + Depth + Motion Vectors then this user would be free to then also use normal Display Time and not have their entire system locked down by async WebXR rendering.

@toji
Copy link
Member

toji commented May 30, 2023

To summarize my thoughts from the call today:

In order to modify the reprojection behavior we'd need more than just the predicted display time. OpenXR requires that the pose used for rendering be submitted as well, but under WebXR currently there's an assumption that all projection layer content was synchronously rendered with the poses that were provided as part of the most recent requestAnimationFrame callback.

Since the streaming case would by nature be asynchronous in delivering rendered results we could expect a pattern like this:

  • requestAnimationFrame delivers Frame[0].
  • Pose[0] from Frame[0] is sent to the server to render.
  • While waiting for the server's reply rAF cycles N more times, providing Frame[1...N]
  • Server returns rendered results for Frame[0]
  • The next requestAnimationFrame delivers Frame[N+1], and will be provided a projection layer with the results for Frame[0]. This will break reprojection, as WebXR will intrinsically only try to correct for Pose[N+1], not Pose[0].

Thus what we really want, assuming the server is not generating synthetic poses interpolated from the reported ones (which sounds bad) is for the page to specifically say "The image I'm about to provide to you was rendered with Frame[N]". The frame will have the necessary poses and times already embedded in it, which will make life easier. In this scenario it could be useful to also communicate "I expect that it's going to take me X milliseconds to produce the rendered image", but that also feels like something that the browser could pretty quickly estimate for itself given a few async frames.

Note that this only applies to projection layers. Other layer types should be able to be rendered during compositing as usual.

I'm not sure what the mechanism should be for saving and referencing the historical frames when rendering. We'd probably want to define some maximum latency so that we don't have to hold onto old frames in perpetuity just in case somebody might produce a frame against them.

@Yonet Yonet removed the agenda Request discussion in the next telecon/FTF label Jun 1, 2023
@sdwlig
Copy link

sdwlig commented Jun 21, 2023

Thank you for thinking about this. We have head, hands / controller, & button state tracking working with a complex remote game that is streaming a two-layer video encoding of stereo video. I wouldn't (and we don't) think in terms of requesting a frame when requestAnimationFrame() is called. Rather, when streaming starts, we generate frames as fast as we can, streaming to the client WebXR app. On each requestAnimationFrame(), we show the latest frame. We're aiming for 36fps, with ASW getting us to a target of 72fps. But it could be 30->60fps, or even 20->60fps with two interpolated frames per rendered frame.

There is latency communicating head position etc. to the game, latency to render & encode, latency to stream the frames, and latency to decode & render. We're working to minimize all of that of course, but it will always be somewhat dynamic, and vary according to device & network conditions & capabilities. The goal is to minimize the feeling of latency, apparent jitter, and overall jankiness. ASW is an important part of this, but we do need some additions beyond the current assumptions of what went into the last frame and what should go into the next frame.

As I understand it, these are the elements of the old simple space warp and ASW:

Old SW: lastFrame * headMotionVectors * dTimeSinceLastFrame

ASW: lastFrame * headMotionVectors * frameMotionVectors*depth * dTimeSinceLastFrame

It seems that this is effectively the basis for a fragment shader program based on those transforms for each pixel, with motion vectors scaled by both depth & time. Knowing the actual ASW algorithm or shader would be helpful, and it seems that one could create a fragment shader that does just that.

What we need, without and with pose prediction. Pose prediction (PP) is similar to ASW, but before rendering: with current head motion vectors, where will head likely be at T + roundTripLatency (RTL)?

With: dTSLF = dTimeSinceLastFrame

No PP: lastFrame * headMotionVectors * frameMotionVectors*depth * (RTL + dTSLF)
That time offset doesn't quite make sense, but some consideration of RTL should improve results. It seems that the rotation+translation space warp should be applied to the rendered frame according to the difference between headPose then (-RTL ago) vs now, then for dTSLF full ASW should be applied while still computing that headPose offset.

with:
netHeadPose = (PP - headMotionVectors) after offsetting prediction error. This would also have a factor to account for latency mismatch: the PP for 80ms is being considered at 100ms or 60ms.
dNetTime = currentTime - expectedPPTime

With PP: lastFrame * netHeadPose * frameMotionVectors*depth * dNetTime

When the current frame is rendered to the framebuffer, which presumably then gets composited via the ASW shader program, we need to be able to provide the netHeadPose & the dNetTime, probably in the form of PP & expectedPPTime so that the current headPose & time can be subtracted.

As for different parts of the scene being treated differently, that could be patched via the motion vector texture. But probably better to be able to call a function or shader program or sequence of shader programs to run ASW on a texture with certain parameters. Then different content could be composited by area or depth map or similar, possibly running ASW on different textures with different head pose & time offset parameters.

One key question we have is about expected & allowed resolution(s) for the motion vector & depth textures. One reference says that they are lower resolution than the texture. Is that a particular resolution, like 1/4, or is any resolution supported, sampled like other textures? Because we have to compress & transmit these textures, we'd like to be able to tune the tradeoff between bandwidth & quality. Supporting arbitrary motion vector & depth textures is highly desirable. And perhaps with options for compactness strategy. Also, it is unclear if & when the simpler spacewarp is active, when we aren't providing motion vector & depth textures. Additionally, we may not want to use a layer for this, but to render as directly as possible to the display panel's framebuffer. We are highly sensitive to quality losses if it is not exactly video pixel to panel pixel rendering. We are doing that now, managing IPD etc. based on pose viewports. Being able to invoke ASW textures -> texture supports this.

All of this seems to point to one of two main directions: Provide offset parameters to the embedded, last stage ASW. Or expose the ASW transformation as a callable step with explicit parameters so that an application can easily pass the defaults or perform complex adjustments & compositing. If we can't get a usable solution, and default ASW isn't good enough in our situation, we are planning to start implementing our own (probably naive at first) ASW implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants