Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM examples broken if the user switches tabs #144

Closed
cBournhonesque opened this issue Feb 12, 2024 · 54 comments · Fixed by #352 or #371
Closed

WASM examples broken if the user switches tabs #144

cBournhonesque opened this issue Feb 12, 2024 · 54 comments · Fixed by #352 or #371
Labels
A-Transport Related to the transport layer C-Bug Something isn't working P-Critical

Comments

@cBournhonesque
Copy link
Owner

cBournhonesque commented Feb 12, 2024

I think this PR broke the examples for some reason.

  • the examples work fine on native
  • on wasm, the connection gets timed out pretty quickly for some reason

UPDATE:

  • it looks like the update is that the client tasks stop running when the page is alt-tabbed, so the connection times out if the user alttabs for too long!
  • the second problem is that the server seems to be stuck in a weird state where it doesn't accept webtransport connections anymore.

On client we get:

Failed to establish a connection to https://127.0.0.1:5000/: net::ERR_QUIC_PROTOCOL_ERROR.QUIC_NETWORK_IDLE_TIMEOUT (No recent network activity after 9003077us. Timeout:9s).
failed to connect to server: Error(JsValue(WebTransportError: Opening handshake failed.Error: Opening handshake failed.))

POSSIBLE SOLUTIONS:

  • run in background thread using webworkers?
@cBournhonesque cBournhonesque added C-Bug Something isn't working P-Critical A-Transport Related to the transport layer labels Feb 12, 2024
@MOZGIII
Copy link

MOZGIII commented Feb 22, 2024

Let me know if you figure out the solution; if this is something that's better solved at the xwt level I'm interested in adding support.

@cBournhonesque cBournhonesque changed the title WASM examples broken WASM examples broken if the user switches tabs Feb 26, 2024
@Nul-led
Copy link
Collaborator

Nul-led commented Mar 12, 2024

@cBournhonesque quick update: browser seems to pause io tasks when in ram saving mode regardless of whether or not the client runs in a webworker, so the appropriate solution should be adjusting timeout i think

@Nul-led
Copy link
Collaborator

Nul-led commented Mar 12, 2024

(only tested on brave, but should be the same for other major browsers when ram saving is active)

@Nul-led
Copy link
Collaborator

Nul-led commented Mar 12, 2024

when ram saving mode is disabled, this issue does not seem to occur, so im certain that this is indeed the root cause

@MOZGIII
Copy link

MOZGIII commented Mar 12, 2024

There are tricky sneaky ways to keep the tab alive if this is the reason btw - but none I'd recommend implementing at this crate level

@cBournhonesque
Copy link
Owner Author

@Nul-led so you have confirmed that, if you disable ram-saving mode, you can freely switch tabs and the game (including io tasks) will continue working in the background? i.e. the issue totally disappears if you disable ram-saving mode?

@Nul-led
Copy link
Collaborator

Nul-led commented Apr 17, 2024

@cBournhonesque apparently the thread does not actually get stopped entirely but instead just throttled. Might be possible to figure out if that happens and temporarily stop sending and receiving packets.

Disabling ram saver seems to work on brave, cant say with other browsers. Requires more testing ig.

@simbleau
Copy link
Contributor

simbleau commented Apr 17, 2024

It's very unclear what the real issue is from reading this.

Can someone confirm:

  • Who causes the disconnection? Does the server terminate the client, does the client terminate themselves, does the browser terminate the session, etc.?
  • Any confirmation of whether the server bug still happens, and why?

Regarding the "Keep Alive" strategy:

  • I don't think this is possible. Unlike websockets which are browser-scoped, WebTransport is tab-scoped. That means when you switch tabs, WebTransport fully stops. The KeepAlive strategy only works for websockets.
  • However, could we instead modify the constraint to allow messages an infinite amount of time to return a response, to prevent timeout?

Other strategies:

  • Would automatic re-connection be possible?
  • Has anyone tried a web worker to keep it alive? This seems like it would work, since the script will run independently of the main thread, allowing it to continue executing even when the user switches tabs.

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

If you have audio playing on your tab it won't get suspended. It would actually be quite fine for a game to apply this workaround.

Automatic reconnection at the WebTransport layer and all that is not possible since the browser doesn't really give us control over those details of the connection that would enable it: specifically, timeouts. It is the browser that would terminate the WebTransport session, and this will happen regardless of whether tab is paused or not, so we can't hook into it.

It is totally possible to implement the reconnect at the app level though. Would require a certain layer of logic on top of the transport, like a custom handshake to identify the connecting party - but that is possible. The lack of control over the RTT0 in the browsers API is a bit unfortunate here - but if it was there it would be not as bad latency-wise.

@simbleau
Copy link
Contributor

simbleau commented Apr 17, 2024

If you have audio playing on your tab it won't get suspended. It would actually be quite fine for a game to apply this workaround.

Automatic reconnection at the WebTransport layer and all that is not possible since the browser doesn't really give us control over those details of the connection that would enable it: specifically, timeouts. It is the browser that would terminate the WebTransport session, and this will happen regardless of whether tab is paused or not, so we can't hook into it.

It is totally possible to implement the reconnect at the app level though. Would require a certain layer of logic on top of the transport, like a custom handshake to identify the connecting party - but that is possible. The lack of control over the RTT0 in the browsers API is a bit unfortunate here - but if it was there it would be not as bad latency-wise.

I do think this should be brought up to W3C or a WT working group, but regardless...

Since we're blocked on W3C and browsers, it sounds like there are only 2 reasonable solutions that will be solved within the heat-death of the universe.

  1. Webworkers
  2. App-level reconnecting

(or Both, long term)

I think 2) is understood enough by yourselves to be fixed today.

Is there any chance we could add that logic to the simple-box example? Specifically, to spell it out to new users (myself), or have an entirely new demo just for reconnecting.

Ideally, 3 app states: Connected, Reconnecting, Disconnected. If reconnecting happens, have some text centered on screen that says "reconnecting...".

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

The real solution is adding audio to the game... :D

@simbleau
Copy link
Contributor

simbleau commented Apr 17, 2024

Maybe this is a joke, but I don't consider this a serious solution.

The real solution is adding audio to the game... :D

I'll play devil's advocate though ... what if the user mutes their browser tab? Would it still work?

image

@cBournhonesque
Copy link
Owner Author

@simbleau I don't really understand the problem clearly myself.
My current understand from reading the messages above are:

  • this is less of an issue with WebSocket because the websocket session is browser-wide, but the tab can still be throttled which could cause issues
  • for WebTransport the session is per-tab, and gets suspended/ended by the browser when the client switches tabs; so setting an infinite timeout on the server side doesn't fix the issue?

As for the reconnecting logic:
I've started adding more networking-related state to the library, primarily so that we have more runtime-control over the networking configuration (so that a disconnected client can select a different server, etc.): https://github.com/cBournhonesque/lightyear/blob/main/lightyear/src/client/networking.rs#L279
This could be adapted to support reconnections. So the idea would be that when a user switches tabs, the server times them out; but when they reconnect, the server recognizes that it's the same ClientId and resumes their position in the game?

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

Maybe this is a joke, but I don't consider this a serious solution.

It is though. There are npm packages that play an audio stream of barely audible noise precisely to do just this.

I'll play devil's advocate though ... what if the user mutes their browser tab? Would it still work?

No, it won't. Users have to comply with the workaround if they want to remain connected, and if not - well, it is always up to them. Browsers don't have a good way to keep a tab alive. There's https://www.w3.org/TR/screen-wake-lock/ but it is fora different purpose.

The easiest way for the user to keep the tab active is if it plays audio. The less easy way is for them to add the origin to the list of websites that never go inactive, and the most difficult way is to disable who whole Chromium / Firefox feature - which is nonetheless doable.

That said, there's also background sync, so, maybe you don't actually need the WebTransport session... This does not seem like a portable solution fit for this kind of crate though. Maybe for a more comprehensive networking solution specialized on web apps/games.

@simbleau
Copy link
Contributor

A small, important clarification:

  • @MOZGIII said It is the browser that disconnects the WebTransport session when you switch tabs, not the server.

Re: web sockets- Yes, that's right. bevy_rtc doesn't have this issue because it uses WebRTC with signaling built over web sockets. Those web sockets never go idle because, regardless of whether the client app is frozen, the server continues to send KeepAlive packets to the web socket.

Re: reconnecting - it's unclear to me, too. I lean on you two to figure this out. I'm guessing when you connect there's a refresh token the client can be told about for "fast reconnecting," However I'd be fine with a total teardown/re-connect. As long as there's some way to reconnect...

@simbleau
Copy link
Contributor

What about web workers?

@simbleau
Copy link
Contributor

simbleau commented Apr 17, 2024

Maybe this is a joke, but I don't consider this a serious solution.

It is though. There are npm packages that play an audio stream of barely audible noise precisely to do just this.

I call that a hack, not a solution.

Perhaps we need to file a case under W3C, actually, to address this.

Because even for games, that's a shitty "solution". I mute tabs often, especially games. Communicating the technical problem and putting the onus on users to circumnavigate it is technically embarrassing and difficult for, eg. Children and childrens games.

@simbleau
Copy link
Contributor

Filed w3c/webtransport#600

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

I call that a hack, not a solution.

It is absolutely a hack. As you said, W3C has to deal with it, the would probably be a new Wake Lock Web API for this. This is a lot of work however, and definitely not something that is available today - so the workarounds and hacks are still meaningful to discuss here.
After all - it a hack solves the issue it is usually classified as a "good enough" solution and most people can move on the next thing.

UPD:

Filed w3c/webtransport#600

This is great, let's see what they say! I have doubts they'll give us something, as this is a Chrominum thing and is standartized afaik.

I've been going though the source to figure out where it's implemented, so far found this - might be a good place to explore for others too.

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

Re: reconnecting - it's unclear to me, too. I lean on you two to figure this out. I'm guessing when you connect there's a refresh token the client can be told about for "fast reconnecting," However I'd be fine with a total teardown/re-connect. As long as there's some way to reconnect...

I was talking about the 0RTT QUIC handshakes - they allow establishing a new QUIC connection reusing some of the key material data from the previously-established-but-now-closed QUIC connection to save a few exchanges in the handshake. This is not resuming the old connection though - it is creating a new connection, so re-connection.

With re-connection, it all depends of how the application handles the new connection. If it has a persistent identifier for the client and correlates the context with the said identifier rather than the connection - so that the connections are context-less besides providing the reference to the said persistent identifier - it is very trivial to implement reconnections, assuming the apps supports "connecting mid-game" or otherwise allowing newly connected clients in whatever is going on. This is usually done in games through the initial world-state replication on connection - but in this case an additional support for replicating updates for the previously connected persistent identity (just over a new connection) would be required.

So, the application-level support for seamless reconnection would likely be a "real" solution, as it would not rely on transient state like WebTransport session to be intact in the first place.

I would say though this is a job either for a specific application/game to implement, or a really high-level networking framework, that takes opinionated control over away more things that lightyear in particular currently does.

That said, the solution would most likely have to transport-agnostic, as this is in now way a WebTransport-specific issue - as a typical transport state is transient.


QUIC (the HTTP3/WebTransport underlying protocol) has keep-alive for idle connections as well. See https://datatracker.ietf.org/doc/html/rfc9308#name-session-resumption-versus-k

Web API for WebTransport may just expose the configuration parameters for idle connections management - but overall this is still worse than the solution above, albeit less of a hack than playing audio.

Note that this, however, would not solve the issue - well, at least maybe not entirely.
If the app code is frozen, the queues won't be drained and get overfilled. The browsers will either crash the WebTransport [[Session]] or evict the older datagrams from the queue - meaning the protocol will be disrupted, and the app code will have to recover from this, which is either way likely be resetting the replication state and requesting either whole world state as for the initial connection - or a list of state updates since the last known world state; the latter only works great if you have a deterministic game, or when the desyncs are not a particularly bad problem - like for the online cooperative archvis apps, where there's no need for an authoritative conflict resolution like in games.

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

  • @MOZGIII said It is the browser that disconnects the WebTransport session when you switch tabs, not the server.

There are a number of scenarios here that could happen. I have not investigated it in practice, but it is true that the browser causes the connection to disconnect - potentially by not enabling the keep alive settings. But this is unclear, and might be that the server actually sends the goaway frame.

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

What about web workers?

Thinking about this - if you can extract the whole networked game state maintenance loop into the Web Worker together with the WebTransport - sure, that would work (well, except WebWorkers are deactivated too at certain times, so maybe a ServiceWorker instead, but this can be determined later down the line). That way you can ensure the data the server communicates is not lost and processed to the best of the client's ability while the rendering is unavailable. But moving only WebTransport out would cause the same issue I described at #144 (comment) (second part).

@MOZGIII
Copy link

MOZGIII commented Apr 17, 2024

At the w3c/webtransport#600 they are saying it's an implementation bug, which is what I was very much suspecting thus my attempts to find the tab deactivation code in the Chromium source. From what I recall from reading WebTransport though - it shouldn't be an issue with the tab deactivation. What is most likely the issue though is that the client and server can't agree on the idle timeouts - which may or may not be caused by Chrome side, but based on the lack of the settings to tweak the idle timeout in the spec - it could.
That said, double-check your server side - you could just enable the idle connection keep alive from the server side.

Unfortunately, there is still a problem of data loss that has to be solved (world state reinit or state diff sync), because the datagrams will be dropped from the recv queue if the app can't keep up with them, and the frozen app definitely can't.

@simbleau
Copy link
Contributor

simbleau commented Apr 18, 2024

Ok so, we need confirmation from a Chromium filed issue this is a bug. Otherwise we aren't sure if it's a lightyear/xwt bug. There will be people, myself included, who wouldn't experiment or adopt lightyear today if this is a design choice of WebTransport that won't be fixed.

Secondly, I'll propose we document the workaround: Disable RAM saving mode with an issue to track the Chromium bug.

Lastly, anyone want to add a reconnection example? I think it would be helpful in any case.

@Nul-led
Copy link
Collaborator

Nul-led commented Apr 26, 2024

Would actually be really easy to confirm. If this behavior happens with WebSocket Transport too, then we have our culprit i think :P

@simbleau
Copy link
Contributor

I haven't actually tried anything more than the examples for lightyear. I'm waiting on #253 to really dive into using WT. Hopefully someone can confirm who has experience with Lightyear.

@MOZGIII
Copy link

MOZGIII commented Apr 28, 2024

Would actually be really easy to confirm. If this behavior happens with WebSocket Transport too, then we have our culprit i think :P

I now have examples for xwt itself - so another way would be to run those and check if they also demonstrate the same behaviour.

@Nul-led
Copy link
Collaborator

Nul-led commented May 14, 2024

@MOZGIII it works

@Nul-led
Copy link
Collaborator

Nul-led commented May 14, 2024

@cBournhonesque so RAF is indeed the culprit...

@simbleau
Copy link
Contributor

@cBournhonesque so RAF is indeed the culprit...

What is RAF?

@Nul-led
Copy link
Collaborator

Nul-led commented May 14, 2024

@cBournhonesque so RAF is indeed the culprit...

What is RAF?

@simbleau
requestAnimationFrame aka the browsers frame scheduler

@simbleau
Copy link
Contributor

Do we have a hypothetical solution or just have identified the problem?

@cBournhonesque
Copy link
Owner Author

cBournhonesque commented May 14, 2024

I tried using wasm_bindgen_futures instead of BevyIoTaskPool (#352) since the xwt example seems to be doing that: https://github.com/MOZGIII/xwt/blob/master/examples/microapp/client/src/main.rs#L14
but I still get disconnected on tab changes..

I don't really get the RAF part, but it might because bevy still stops running when we switch tabs, which means that we stop sending/receiving keepalive packets because the netcode logic runs inside bevy.
When we come back to the tab, the bevy system with netcode runs again, sees that the last packet received was >10sec ago and triggers a timeout.

Potential solutions:

  1. play audio (needs to be tried, still) to force bevy systems to still run?

  2. put more of the netcode logic outside of the bevy systems and inside the wasm_bindgen_futures::spawn_local task. For example we would keep sending keepalives, we would keep receiving packets (that stay buffered in an unbounded channel). When the bevy task restarts, it reads all the messages that have been buffered in the channel

  3. doesn't even seem to work because then the client would have to process 1000s of frames' worth of updates when we open the tab again.

@simbleau
Copy link
Contributor

simbleau commented May 14, 2024

the bevy system with netcode runs again, sees that the last packet received was >10sec ago and triggers a timeout.

... So this is a software timeout? I feel like we've asked that before and the answer was less clear than it is now. That's exactly why we filed the issue under W3C/WebTransport, since it was believed the behavior was from the browser's WebTransport runtime.

This feels really silly now.

Could we just disable the timeout in the bevy system? At the very least it seems reasonable for it to be configurable.

@cBournhonesque
Copy link
Owner Author

cBournhonesque commented May 14, 2024

It is already configurable:

pub client_timeout_secs: i32,
(for when the client generates the ConnectToken in Authentication::Manual, it's a bit confusing..)
and
pub client_timeout_secs: i32,

It's just that having a very high value (20+ seconds) doesn't seem ideal. If a client disconnects suddenly (closes the tab), you would have to wait 20 seconds before the server is aware of the disconnection.

I also created an issue on bevy to potentially make the scheduler keep running bevy systems even if the tab is in the background: bevyengine/bevy#13368

@MOZGIII
Copy link

MOZGIII commented May 14, 2024

Another possibility is bevy might be doing something to actively put itself (its wasm instance) on hold on tab switches.

@MOZGIII
Copy link

MOZGIII commented May 14, 2024

Can anyone make a simple / minimal guide on how to reproduce this issue?

@cBournhonesque
Copy link
Owner Author

cBournhonesque commented May 14, 2024

I'm trying to make a simple example (without networking): bevyengine/bevy#13370
(intructions to run wasm examples are in the readme)
You can look at the counter log to see if the systems were running when the tab was in the background or not.

@Nul-led
Copy link
Collaborator

Nul-led commented May 14, 2024

@MOZGIII rAF just holds indefinitely while the tab in inactive. Thats known behavior. So thats determined to be the issue.

@MOZGIII
Copy link

MOZGIII commented May 14, 2024

Well, yes, for RAF that's expected. But why does it still break when the code is run using wasm_bindgen_futures?

Was there a miscommunication or confusion here of some sort?

@MOZGIII
Copy link

MOZGIII commented May 14, 2024

Ah, I read the issue. I am not sure you'd want that - to run bevy systems in the background... Might be better to extract the systems that need to run while in background into their own threads (or Promises, but not bevy tasks). That's what my architectural approach to this would be, at least.

Anyhow, if you need to run bevy systems specifically it could be solved by using/compositing multiple schedulers - in a way that you run some systems on RAF and some with fixed intervals. That would also make bevy tasks function. This could be something that's offered by bevy out of the box - but I'd recommend first experimenting with this locally, as whatever bevy upstream implements might still be suboptimal for lightyear's use case...

@Nul-led
Copy link
Collaborator

Nul-led commented May 14, 2024

@MOZGIII i generally agree with this sentiment.
currently waiting for a reply on https://discord.com/channels/691052431525675048/750833140746158140/1240028202739437588

@cBournhonesque
Copy link
Owner Author

Sorry I'm a bit slow... is this a good summary?

Potential solutions:
A) add audio as a quick way to get unblocked. Bevy systems will still run unthrottled.

B) set the netcode timeout to a very long time as a quick way to get unblocked. The io tasks shouldn't timeout anymore since they still run in the background when spawned via wasm_bindgen_futures, if I understand correctly? or we can put the io tasks in a WebWorker if they are still throttled.

The issue is that the bevy systems will still be throttled on the client so:

  • the server would keep sending updates for every entity all the time, since we keep sending updates until we receive an ack.
  • when the main thread is unthrottled, the client would have to process all the received packets at once. (+ the buffer would overflow so some packets would be lost, etc.)
    Basically what @MOZGIII said here

C)

  • keep bevy systems running in the main thread, which is throttled
  • spawn the io-related tasks in a WebWorker (which would run in the background in an unthrottled manner even if the user switches tabs). The task would:
    • send packets to the server. In practice there would be no packets from the game, since the game is throttled/paused. So instead we can just try to keep sending keep-alives. This is the hard part.
    • receive packets from the io (webtransport) and store them in a buffer (bounded-channel).

Same issues as in B).

D) Handle disconnection/reconnection in your game.

  • disconnect the client but without despawning its controlled entities
  • when the client joins the tab again (can be detected via winit events), reconnect the client. We replicate the entire world to the client again. The game needs to detect that the new connection corresponds to the same client (maybe by keeping track of the ClientId in the ConnectToken.

It's already possible to disconnect/reconnect; so I guess this would be the best solution?

E) have some other way to force bevy systems to still run in an unthrottled manner.
Relevant issue: bevyengine/bevy#13368
Looks it would probably be by putting the entire bevy app inside a webworker?

@MOZGIII
Copy link

MOZGIII commented May 15, 2024

I am thinking currently that having a separate, non-bevy world and ECS for game logic that it network-replicated is a good idea. It is definitely an option to add to the list above, because that thing can in theory run in a WebWorker and handle not only the packet buffering, but full processing of them.

The issue with this is that WebWorker to window communications can be permissively slow in terms of latency - in the 10s of milliseconds just to send a message. This is not great for any game - might be ok for some, but even there users could notice easily that the game is not very responsive. For other games that would be a hard blocker, I mean waay worse than freezes on tab switches.


So, for this crate, I'd suggest either building a portable core that can be used in any way - depending on the app needs, or supporting either one of in-WebWorker or in-window ways of running the networking, or explicitly both.
What I mean is this is likely an important decision to select the target setup and optimize with that in mind.

@simbleau
Copy link
Contributor

simbleau commented May 16, 2024

It is already configurable:

pub client_timeout_secs: i32,

(for when the client generates the ConnectToken in Authentication::Manual, it's a bit confusing..)
and

pub client_timeout_secs: i32,

It's just that having a very high value (20+ seconds) doesn't seem ideal. If a client disconnects suddenly (closes the tab), you would have to wait 20 seconds before the server is aware of the disconnection.

I also created an issue on bevy to potentially make the scheduler keep running bevy systems even if the tab is in the background: bevyengine/bevy#13368

Actually, there is an Bevy API for window focus.
https://docs.rs/bevy/latest/bevy/window/struct.WindowFocused.html

It's unclear if WindowFocused would be a triggered event on tab switches. It should be, but I'll be honest - Web doesn't always work as expected in bevy.

We could try something with this API, but maybe here's a better idea:

I'd assume the best solution to disconnect if and only if the last message was over ~20 seconds ago, but we've polled the message queue actively within that time. Otherwise, the timer would be much longer (5 minutes). This would mean we'd need two timers on both the client and server. One for active listening (true timeout) and one for inactive listening (user tabbed away).

We could make this more robust by potentially sending a message from the client when they tab away. There's nothing stopping us from adding a Startup system to trigger a closure when the user tabs away, e.g.

use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;
use web_sys::{Document, window, Event};

#[wasm_bindgen(start)]
pub fn start() -> Result<(), JsValue> {
    // Get the document object
    let document = window().unwrap().document().unwrap();

    // Define the callback function
    let callback = Closure::wrap(Box::new(move |_event: Event| {
        if document.visibility_state() == "hidden" {
            // The tab has become unfocused
            web_sys::console::log_1(&"Tab is unfocused".into());
        } else {
            // The tab has become focused
            web_sys::console::log_1(&"Tab is focused".into());
        }
    }) as Box<dyn FnMut(_)>);

    // Add the event listener for the 'visibilitychange' event
    document.add_event_listener_with_callback("visibilitychange", callback.as_ref().unchecked_ref())?;

    // Keep the callback alive
    callback.forget();

    Ok(())
}

@simbleau
Copy link
Contributor

simbleau commented May 16, 2024

Sorry, going to summarize my incoherent thoughts above in a much more digestible manner here:

Steps:

  • Add a Startup system which will add an event listener to the visibilitychange DOM event, which will run a closure to:
    • If Tab unfocused:
      • Send a message to the server that the tab has become inactive.
      • Change the timeout to a longer timeout
    • If Tab focused:
      • Change the timeout to a shorter timeout.
  • The server will respond to these tab focus messages, to switch its timeout for that user to a longer/shorter timeout.
  • When the client comes back, it may experience a buffer overflow. Allow configuration, or implement default behavior such that if a buffer overflow ever occurs, it disconnects the client automatically with the reason "buffer overflow".
  • User can add application layer logic to attempt re-connection automatically on such an event.

Web workers won't do, simply put. It's a tremendous overhead. It has merit in some applications, but should be a very last resort, and is likely a complex task.

Likewise, I don't see any bevy changes happening anytime soon, especially one with the complexity of running systems in web workers.

@MOZGIII
Copy link

MOZGIII commented May 16, 2024

  • may experience a buffer overflow

Small correction on the nature of WebTransport:

  • If we use streams then we'll get a stream error if the browser's internal receive queue has overflown. If not - the xwt code will be able to spit up stream chunks without a buffer overflow over multiple reads.
  • If datagrams are used, the datagrams will be dropped when the browser internal queue fills up (not sure in which order, but I'd assume the oldest will be evicted) - and the xwt code will be able to return the ones that the browser is still kept.

Also, I don't think you can change the client (browser) timeout, can you?


I don't think it makes sense to add another timer - just allow reconnecting asap, with the new session evicting the old session. A simple algorithm sketch to do that would be for the client could generate a random number every time it starts, and present that number to the server upon connecting, and reconnecting. The server, if sees that the same number is already used by another presently connected session upon connection would just assume that session is dead and would drop soon due to timeout, and force-kill it, then transfer server-side resources of that old session to the newly connecting session. thus effectively using the new connection as a signal for the old session force-disconnect, instead of relying on timeouts.

The keep-alive internal and idle timeout on the server side then can just be set to a reasonable, high values for the usual operation without the need to switch the mat runtime. Not sure if those are even tweakable on the client side, pretty sure there's no way: https://w3c.github.io/webtransport/#dictdef-webtransportoptions

@MOZGIII
Copy link

MOZGIII commented May 16, 2024

Likewise, I don't see any bevy changes happening anytime soon, especially one with the complexity of running systems in web workers.

Regarding this: bevy is actually very modular, and so it would be relatively easy to implement a custom even loop for web if the need be. Easiest way is to even not build it from scratch, and just vendor and patch the winit one a bit to test things up, if that proves to be the issue.


From the source, it looks like WindowOccluded event would be fired when the visibilitychange listener triggers - so it might be possible to just read that - however, that requries a precondition of bevy systems ticking in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Transport Related to the transport layer C-Bug Something isn't working P-Critical
Projects
None yet
4 participants