Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice audio issues with 2023.3 and new orchestrator #144

Open
jackjansen opened this issue Mar 15, 2024 · 31 comments
Open

Voice audio issues with 2023.3 and new orchestrator #144

jackjansen opened this issue Mar 15, 2024 · 31 comments
Assignees

Comments

@jackjansen
Copy link
Collaborator

We're seeing all sorts of different issues with voice audio:

  • Too long delays (600ms for uncompressed audio over TCP)
    • And this does not always seem to be reflected in the latency_ms in the stats: output...
  • Sometimes no audio at all (possibly due to orchestrator desync?)
  • Lots of error messages about missing audio (maybe correlated to the above?)

We need to investigate what is going on.

And let's use this issue to document whatever we find.

@jackjansen
Copy link
Collaborator Author

If it turns out that #144 has been fixed we should also check these points again. Crossing finger that they'll also be fixed.

@jackjansen
Copy link
Collaborator Author

Confirmed fixed.

@jackjansen
Copy link
Collaborator Author

Re-opened this. The second and third bullet point were indeed fixed, but the first one not yet.

@jackjansen jackjansen reopened this Apr 26, 2024
@ashutosh3308
Copy link
Contributor

ashutosh3308 commented Apr 26, 2024

There were two things that we tested this and last week:

  • First issue: If all participants can hear each other. We tested it last week and found that all the participants could hear each other. So, this issue has been resolved.
  • Second Issue: The Reported Delay (via Stats) against the Actual Delay (by saying Hello).
    • Ashu tested it with fiddlehead and his laptop. Two scenarios were tested with Direct TCP and Socket I/O, with simple_avatar, Without_HMD. A simple headphone was connected to fiddlehead and in-bulit microphone/speaker was used for the laptop.
    • For Socket_IO and Direct_TCP, the difference between actual and reported latency was approx. 200 ms.
    • The reported_latency was approx for Socket_IO and Direct_TCP was 150ms and the Actual_Latency was approx. 370ms. Please note that the latency for Direct_TCP was 15 ms less than the Socket_IO

@ashutosh3308
Copy link
Contributor

Jack and Ashu tested on MAC to verify if this additional latency could be a Windows_Issue.

  • On Mac, Direct TCP scenario was tested with simple_avatar, Without_HMDs, simple headphone and in-bulit microphone/speaker was used for the laptop.
  • The reported_latency was approx for Direct_TCP was 150ms and the Actual_Latency was approx. 360ms.

@jackjansen
Copy link
Collaborator Author

For reference: @jvdrhoof is also seeing some unexpected latencies, sometimes, with a completely different setup: point clouds over socketio or webrtc. Sometimes (but not always) he is getting latencies in the 500ms range, with all machines (both clients and the orchestrator) on local networks.

This may be a red herring, but it may mean that we are looking at a problem that has nothing to do with audio but is actually in a completely different place.

@jackjansen
Copy link
Collaborator Author

(and, completely beside the point: @ashutosh3308 please reserve backticks print("this is code") for code and verbatim things like numbers 3.0 or 2 ms and host names flauwte.local and such. Use italics with underscores for emphasis, or if you want something stand out. And use Strong emphasis with double stars only very sparingly.

@jackjansen
Copy link
Collaborator Author

On the branch, repeated flauwte to vrtiny measurement, TCP transport.
Measured latency (clap plus echo) was 370ms. Except once, where it was about 600ms.

Reported latency on vrtiny was 120-180ms.

Vrtiny time was -17ms uncertainty 15ms.
Flauwte time was -41ms uncertainty 47ms

@jackjansen
Copy link
Collaborator Author

Repeated the exact same experiment, with very similar outcomes.

@jackjansen
Copy link
Collaborator Author

Did another one today. Same setup. Delay was variable, and noticed that as measured latency went up reported latency also went up, and vice versa.

It almost seems as if the measured latency is approximately twice the reported latency. But this seems ridiculous....

@jackjansen
Copy link
Collaborator Author

Another observation: on the receiver side, if we compare latency_ms to voice receiver_queue_ms the difference is almost always 60ms.

On the sender side, in AsyncVoiceReader, we have record_latency_ms=28ms and output_queue_ms=30ms.

Assuming a millisecond or so for transmission all the reported numbers seem to match: record_latency+output_queue+transmission+receiver_queue == total_latency

@jackjansen
Copy link
Collaborator Author

Disabled the spatializer. Makes no difference: still about 200ms extra latency perceived versus measured.

@jackjansen
Copy link
Collaborator Author

jackjansen commented May 2, 2024

But now it seems that the "DSP Buffer Size" we set in Project Settings -> Audio sticks. And it was set to "best performance" previously.

I have set it to "Best Latency". Difference between perceived and measured latency goes down to 140ms.
dspBufferSize is now 256, and it was 960 previously. So 700 samples difference, so at 48000 sample frequency that is about 15ms.

So this does not explain the 60ms decrease in perceived latency. And it doesn't explain where the other 140ms go.

Need to test Mac-to-Mac, to ensure were not also looking at some Windows problem.

@jackjansen
Copy link
Collaborator Author

Experimented with audioFps=100, in other words: a single packet is now 10ms in stead of 20ms.

Perceived latency goes down to 180ms. Measured latency goes down to 60ms. So measured is a bit closer to perceived, but not all that much (still 120ms difference).

@jackjansen
Copy link
Collaborator Author

Go the other way, audioFps=10. This should greatly increase the latency, but that latency difference should show up in the reported latency.

Record latency=60, output queue=100-160, receiver queue=200, latency=300-350. So those numbers seem to match, somewhat.

Perceived latency is around 500. so we are still looking at a discrepancy of around 150ms.

@jackjansen
Copy link
Collaborator Author

Repeated at home, Mac->Windows (sap to beelzebub).
Measured latency 800ms, reported latency around 500ms. So a difference of 300ms, much higher than yesterday.

I did notice that both machine had dspBufferSize=1024.

@jackjansen
Copy link
Collaborator Author

Set both machines to Project Settings -> Audio -> DSP Buffer Size -> Best latency. Make very little difference.

@jackjansen
Copy link
Collaborator Author

Set both machine to mono, no spatialised, no virtual effects, no audio suspension.
Makes very little difference.

@jackjansen
Copy link
Collaborator Author

Next experiment: Mac to Mac (using beignet as the receiver).
Reported latency almost rock-solid at 510ms, almost exactly the sum of the components.
Measured latency 660ms.

@jackjansen
Copy link
Collaborator Author

Unfortunately all these experiments show is that various theories we had are all wrong:

  • It is not a Windows driver problem,
  • it is not a spatialiser problem

@jackjansen
Copy link
Collaborator Author

Tried with a built player (Mac->Mac). Again doesn't make a lot of difference: reported 350ms, measured 430ms.

So there is a lower discrepancy between reported and measured, 80ms, but it's not a difference to write home about.

@jackjansen
Copy link
Collaborator Author

Tried with a built player on Windows too (beelzebub).

Reported latency 135ms, measured latency 400ms. Discrepancy 165ms.

@jackjansen
Copy link
Collaborator Author

Tried a different way of measuring: don't do an audio-only capture but record a video. On the voice receiver machine select the VoicePipelineOther so we can see its VU-meter in the inspector.

Load the video into Logic, so we can examine the waveform and the video with the VU meter at the same time.

The VU meter is pretty much frame accurate...

So something in Unity knows when it is playing out the audio......

@jackjansen
Copy link
Collaborator Author

Added a "DIY VU meter", by computing the RMS in the buffer. Recorded it with a slo-mo camera.
Sometimes it gives the similar readings to the Unity VU meter, sometimes not?

@jackjansen
Copy link
Collaborator Author

The DIY VU meter improved (sort-of) by using GetSpectrumData. Now it gives the same readings as the Unity VU meter.

@jackjansen
Copy link
Collaborator Author

But accidentally found something very interesting, after a session was running for a long time: even with both input queue sizes set to 2 packets the reported (and perceived latency go up and up and up......

Seems like packets re being held up somewhere in the network or in AsyncTCPReceiver?

@jackjansen
Copy link
Collaborator Author

That was a red herring. At least, it wasn't related to what we're looking for. The queue lengths were long, and the first queue was non-dropping. That may be something we want to address, but it's not related to the extra 100-200ms latency we perceive.

@jackjansen
Copy link
Collaborator Author

Decided to go back to Unity 2021, just to see whether maybe this problem has been present for a long time but we just didn't notice it.

First try was branch ancient/deplyment/pilot0, but this doesn't work anymore: it requires the scenario storage at the orchestrator, which isn't there anymore.

@jackjansen
Copy link
Collaborator Author

Tried with newly created branch ancient/deployment/unity2021, which is the situation of mid-february, just before the switch to the new orchestrator and Unity 2022.

This branch has exactly the same problem as the current situation: the perceived audio latency is about 200ms larger than the reported audio latency.

I'm now starting to think that we never actually measured audio latency, and alway just looked at the reported number.

@jackjansen
Copy link
Collaborator Author

@ireneviola also isn't sure we ever really measured this.

@jackjansen
Copy link
Collaborator Author

Ok. We know where the problem is (on the Unity playout side), and we know how to work around it (by adjusting the audio playout clock to be slightly ahead of where we actually want it).

But I'm not going to fix it now. All the groundwork is in place, and the structure of the voice pipeline and P_Player and P_Player_self are ready for it, but I'm going to merge now and fix later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants