Add insert of concealed packets #3308

spscream · 2023-12-13T18:35:54Z

this changes were created to address #3297 for our environment.
I'm not sure if it is fully correct, but it mostly fixes our problems.

januscla · 2023-12-13T18:35:58Z

Thanks for your contribution, @spscream! Please make sure you sign our CLA, as it's a required step before we can merge this.

spscream · 2023-12-13T19:16:27Z

src/plugins/janus_audiobridge.c

+
+							janus_mutex_lock(&participant->qmutex);
+
+							for(int i=1; i < lost_packets; i++) {


I think to get rid of cycle and produce plc data which frame_size will be lost_packets * OPUS_SAMPLES and shouldn't exceed BUFFER_SAMPLES size.
In our production I've seen losses of 20+ packets in a row, so plc for 20 packet won't fit BUFFER_SAMPLES and I'm not sure how to behave in this case.

It's actually the whole thing that perplexes me. The way it's done right now, you're basically breaking FEC, as it would never be done. You're preserving the use_fec = true bit, but it will be useless, since later on we'll try to get redundant information from a packet that doesn't have any (the PLC data you've added). I think PLC and FEC should be handled separately: if the gap is 1, then do what was done before; if it's higher than 1, do PLC (new code, without use_fec=true). Or do you have a reason for implementing it this way?

As a side not, the LOG_ERR there is probably overly verbose, and would clog the logs any time there's bursts of lost packets, which may happen often. A LOG_VERB or even LOG_HUGE would probably be a better choice (but that's something that can be discussed, it may still be useful to have info on the logs when this happens).

This code generates plc for gap sequence, which can't be recovered using fec. Last gap member is generated using fec of arrived rtp packet, if it's possible(opus_decode will return plc in case fec data isn't available), for example:

gap of 1 will be covered using fec, since condition i < lost_packets for cycle wont match

gap of 2 will generate 1 plc and 1 will be recovered from fec information using rtp payload

I'm affraid of the following problem now - frame size can be differ from packet to packet and decoder should adapt to changes of frame size accordingly since encoder may change frame_size. Looks like chrome encoder changes frame size in case of losses - opus_packet_get_nb_frames returns 10ms frame size periodically on bursts of losses for rtp packet after gap. Chromium realization of fec decode uses opus_packet_get_nb_frames to determine fec frame_size https://chromium.googlesource.com/external/webrtc/+/refs/heads/main/modules/audio_coding/codecs/opus/opus_interface.cc#681).

I left LOG_ERR for debug purpose, i'm going to change it to LOG_VERB or LOG_HUGE for production.

lminiero

Thanks for the PR! I've added a few notes inline with some considerations that we can discuss. I know @atoppi had some comments of his own too, so I'll leave it to him to add more (and possibly correct me if I was wrong somewhere).

lminiero · 2023-12-14T12:13:07Z

src/plugins/janus_audiobridge.c

@@ -1699,6 +1699,7 @@ typedef struct janus_audiobridge_participant {
 	uint16_t expected_seq;		/* Expected sequence number */
 	uint16_t probation; 		/* Used to determine new ssrc validity */
 	uint32_t last_timestamp;	/* Last in seq timestamp */
+	uint16_t last_seq;	/* Last sequence number */


Code style: indentation of comment looks broken here.

lminiero · 2023-12-14T12:14:47Z

src/plugins/janus_audiobridge.c

@@ -2135,7 +2136,7 @@ static int janus_audiobridge_resample(int16_t *input, int input_num, int input_r
 #define JITTER_BUFFER_MIN_PACKETS 2
 #define JITTER_BUFFER_MAX_PACKETS 40
 #define JITTER_BUFFER_CHECK_USECS 1*G_USEC_PER_SEC
-#define QUEUE_IN_MAX_PACKETS 4
+#define QUEUE_IN_MAX_PACKETS 20


I personally don't like this: it will cause a permanent delay increase for audio of this participant after a burst of lost packets caused this to grow because of PLC, ven way after the problem was solved. I'd argue that a burst of lost packets should indeed cause artifacts (like a speed up), and that if not 4, then a way smaller number should be more than enough to absorb shorter bursts.

I've expiremented with it and low value caused pops/cracks for small bursts.

lminiero · 2023-12-14T12:19:06Z

src/plugins/janus_audiobridge.c

+
+							janus_mutex_lock(&participant->qmutex);
+
+							for(int i=1; i < lost_packets; i++) {


It's actually the whole thing that perplexes me. The way it's done right now, you're basically breaking FEC, as it would never be done. You're preserving the use_fec = true bit, but it will be useless, since later on we'll try to get redundant information from a packet that doesn't have any (the PLC data you've added). I think PLC and FEC should be handled separately: if the gap is 1, then do what was done before; if it's higher than 1, do PLC (new code, without use_fec=true). Or do you have a reason for implementing it this way?

As a side not, the LOG_ERR there is probably overly verbose, and would clog the logs any time there's bursts of lost packets, which may happen often. A LOG_VERB or even LOG_HUGE would probably be a better choice (but that's something that can be discussed, it may still be useful to have info on the logs when this happens).

lminiero · 2023-12-14T12:21:18Z

src/plugins/janus_audiobridge.c

+								pkt->data = g_malloc0(OPUS_SAMPLES * (participant->stereo ? 2 : 1) * sizeof(opus_int16));
+								pkt->ssrc = 0;
+								pkt->timestamp = participant->last_timestamp + 960 * i;
+								pkt->seq_number = participant->last_seq + 1 * i;


I'm not 100% sure the RTP info here is correct, since last_timestamp and last_seq may come from out of order packets. But it's probably not that relevant, since it has a limited use anyway in the code.

I thought jitter buffer is reordering out of order packets and duplicates? At least I've expiremented with out of order and duplicates on output and everything still worked fine

lminiero · 2023-12-14T12:21:52Z

src/plugins/janus_audiobridge.c

 						/* This is a redundant packet, so we can't parse any extension info */
 						pkt->silence = FALSE;
-						/* Decode the lost packet using fec=1 */
+


I'm not sure why you completely changed the way we handle FEC? The existing FEC management should remain the same, if not broken. See comments above.

FEC management which were implemented earlier didn't work in case of gap > 1, I didn't change too much, just adapted it to work using latest arrived rtp packet info instead of expected value based on last decoded packet.

lminiero · 2023-12-14T12:22:08Z

src/plugins/janus_audiobridge.c

 						pkt->length = opus_decode(participant->decoder, payload, plen, (opus_int16 *)pkt->data, output_samples, 1);
+
+						JANUS_LOG(LOG_ERR, "[%d] packet fec decoded [%d] pkt->length, timestamp: [%d]\n",
+								pkt->seq_number, pkt->length, pkt->timestamp);


See above on LOG_ERR usage.

lminiero · 2023-12-14T12:23:18Z

src/plugins/janus_audiobridge.c

@@ -8738,7 +8768,7 @@ static void *janus_audiobridge_participant_thread(void *data) {
 					locked = TRUE;
 					/* Do not let queue-in grow too much */
 					guint count = g_list_length(participant->inbuf);
-					if(count > QUEUE_IN_MAX_PACKETS) {
+					if((int) count > (QUEUE_IN_MAX_PACKETS + lost_packets)) {


Once you increase QUEUE_IN_MAX_PACKETS, then there's no reason to make it even less strict like you're doing now IMHO. It will make the internal mixer queue for the participant grow beyond control and cause huge and uncoverable latencies.

I tried to set it to default 4 and got pops and cracks. On screen is 20-30% of drops set on output channel.

With 20 QUEUE_IN_MAX_PACKETS I don't have such pops/cracks and I don't get over the limit.

lminiero · 2023-12-18T15:41:35Z

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

atoppi · 2023-12-18T16:35:32Z

Thanks @spscream for the effort!

My objection to the implementation is related to the timing of PLC packets generation.

FEC needs packet N+1 to decode packet N (so it will naturally introduce/leverage a playout delay), meanwhile PLC is only calculated from already decoded samples. According to this assumption, the generation of a PLC packet should be triggered in a condition where the bridge encoder "needs something from a participant but there's nothing available", and not after the arrival of a random packet not adjacent to the previous one.

Reusing FEC approach will inevitably lead to issues like unwanted delay, discarding of good packets, uncontrollable queue growth etc.

I still have to think about a better approach in detail, however one idea is to trigger a PLC whenever a participant's queue-in is empty (e.g. detected participant underflow). That means that a loss burst has exhausted both the jitter buffer and the participant's queue (decoded samples), hence the mixer has no to other chance but introduce concealment for the participant.

spscream · 2023-12-18T16:46:43Z

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

yes, I tried it and sound cracks

spscream · 2023-12-18T16:52:42Z

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

yes, I tried it and sound cracks

I can insert n packets of maximum size instead of number of loss packets, but plc should cover time period of lost bursts accordingly to docs. frame_size parameter of opus_decode has the following description:

Number of samples per channel of available space in pcm. If this is less than the maximum packet duration (120ms; 5760 for 48kHz), this function will not be capable of decoding some packets. In the case of PLC (data==NULL) or FEC (decode_fec=1), then frame_size needs to be exactly the duration of audio that is missing, otherwise the decoder will not be in the optimal state to decode the next incoming packet. For the PLC and FEC cases, frame_size must be a multiple of 2.5 ms.

spscream · 2023-12-18T17:02:28Z

I still have to think about a better approach in detail, however one idea is to trigger a PLC whenever a participant's queue-in is empty (e.g. detected participant underflow). That means that a loss burst has exhausted both the jitter buffer and the participant's queue (decoded samples), hence the mixer has no to other chance but introduce concealment for the participant.

This buffer underrun can explain why we still have this cracks/pops for our production users. But I afraid I can't rewrite it to such aproach myself or at least not this year.

lminiero · 2023-12-20T11:35:15Z

I agree with @atoppi that it's the road we should explore, even though it does come with its own challenges: for one, the mixer is the one who'd read from that queue, and it uses a different thread, meaning I'd prefer it not to be the one that should perform that PLC decode. As such, we'll have to better think of how to do that in the participant thread instead. This is something we can definitely discuss together after the vacations, no need to hurry and no need for you to actually implement it yourself either, if we manage to replicate the problem and come up with a potential fix.

spscream · 2023-12-20T12:39:23Z

@atoppi jitter_buffer_get has JITTER_BUFFER_MISSING and JITTER_BUFFER_INSERTION ret codes, so we can use it as a signal to insert plc, you can take it to considerations.

By the way I also discovered possible incorrect usage of JITTER_BUFFER_SET_MARGIN in current implementation - its value should be the same unit as the frame_size (e.g. JITTER_BUFFER_MIN_PACKETS * frame_size), since its used as timestamp under the hood, not as count of packets.

lminiero · 2023-12-20T12:44:05Z

its value should be the same unit as the frame_size

I don't think so. Besides, we're changing what actually ends in the buffer: not the payload anymore, but a pointer to a duplicate of the janus_plugin_rtp we got from the core.

spscream · 2023-12-20T13:30:21Z

its value should be the same unit as the frame_size

I don't think so. Besides, we're changing what actually ends in the buffer: not the payload anymore, but a pointer to a duplicate of the janus_plugin_rtp we got from the core.

I get the following situation with buffering if JITTER_BUFFER_SET_MARGIN is 2, the jitter buffer doesn't mantain its minimal buffer size:

[WARN] [18844549281349] jitter buffer avail: 0
[WARN] [18844549281349] jitter_buffer_get result 1
[WARN] [18844549281771] jitter buffer avail: 0
[WARN] [18844549281771] jitter_buffer_get result 1
[WARN] [18844549281829] jitter buffer avail: 2
[WARN] [18844549281829] jitter_buffer_get result 0
[WARN] [18844549300041] jitter buffer avail: 0
[WARN] [18844549300041] jitter_buffer_get result 1
[WARN] [18844549300112] jitter buffer avail: 2
[WARN] [18844549300112] jitter_buffer_get result 0
[WARN] [18844549302419] jitter buffer avail: 0
[WARN] [18844549302419] jitter_buffer_get result 1
[WARN] [18844549321048] jitter buffer avail: 2
[WARN] [18844549321048] jitter_buffer_get result 0
[WARN] [18844549321144] jitter buffer avail: 0
[WARN] [18844549321144] jitter_buffer_get result 1
[WARN] [18844549323703] jitter buffer avail: 0
[WARN] [18844549323703] jitter_buffer_get result 1

if I multiply JITTER_BUFFER_MIN_PACKETS by frame_size for JITTER_BUFFER_SET_MARGIN:

[WARN] [18844964419785] jitter_buffer_get result 0
[WARN] [18844964438318] jitter buffer avail: 3
[WARN] [18844964438318] jitter_buffer_get result 0
[WARN] [18844964459607] jitter buffer avail: 3
[WARN] [18844964459607] jitter_buffer_get result 0
[WARN] [18844964478355] jitter buffer avail: 3
[WARN] [18844964478355] jitter_buffer_get result 0
[WARN] [18844964499529] jitter buffer avail: 3
[WARN] [18844964499529] jitter_buffer_get result 0
[WARN] [18844964518172] jitter buffer avail: 3
[WARN] [18844964518172] jitter_buffer_get result 0
[WARN] [18844964539348] jitter buffer avail: 3
[WARN] [18844964539348] jitter_buffer_get result 0
[WARN] [18844964557793] jitter buffer avail: 3
[WARN] [18844964557793] jitter_buffer_get result 0
[WARN] [18844964578942] jitter buffer avail: 3
[WARN] [18844964578942] jitter_buffer_get result 0
[WARN] [18844964600042] jitter buffer avail: 3
[WARN] [18844964600042] jitter_buffer_get result 0
[WARN] [18844964618636] jitter buffer avail: 3
[WARN] [18844964618636] jitter_buffer_get result 0
[WARN] [18844964639613] jitter buffer avail: 3
[WARN] [18844964639613] jitter_buffer_get result 0

atoppi · 2024-01-08T10:14:08Z

By the way I also discovered possible incorrect usage of JITTER_BUFFER_SET_MARGIN in current implementation - its value should be the same unit as the frame_size (e.g. JITTER_BUFFER_MIN_PACKETS * frame_size), since its used as timestamp under the hood, not as count of packets.

@spscream yeah you're probably right, the comments in the speexdsp code are probably outdated/misleading.
Internally the macro sets a buffer_margin variable that is used as a RTP timestamp.

spscream · 2024-01-09T12:20:07Z

I've tried another approach here and it works in our case even with huge loses: master...spscream:janus-gateway:fix/am/audiobridge-cracks-fix-rnnoise#diff-476cfd26c102e4b3311b3b8e8f848bfeba283c1712a5f3f117254e0593a986f4R8959

@atoppi could you please look on it? If it is ok, i could refactor it and make another pr for you. Idea is the following:

if i got ret == JITTER_BUFFER_MISSING || ret == JITTER_BUFFER_INSERTION from jitter_buffer_get I insert concealed packets
if loss gap is huge I'm stopping to insert plc

I also tried to insert fec, but it seams it also leads to cracks

atoppi · 2024-01-10T10:07:21Z

@spscream the diff is huge and includes unrelated changes, can you please provide a more readable patch?

spscream · 2024-01-15T12:22:22Z

@atoppi f5361e7#diff-476cfd26c102e4b3311b3b8e8f848bfeba283c1712a5f3f117254e0593a986f4R8745 sure, here is commit with added changes

spscream · 2024-02-06T13:46:44Z

@atoppi hi! any thoughts?

spscream · 2024-02-06T13:47:41Z

If it is ok, I could make a new pr with my latest changes if it will be more convenient

atoppi · 2024-02-06T14:32:18Z

@spscream If you're able to propose a cleaner version of the patch it would greatly help thanks.
I'll try to review this in the next few days.

Add insert of concealed packets

80086ea

spscream commented Dec 13, 2023

View reviewed changes

spscream marked this pull request as draft December 13, 2023 20:44

Set output_samples to last decoded packet duration

2b003ab

lminiero reviewed Dec 14, 2023

View reviewed changes

spscream closed this Apr 2, 2024

spscream mentioned this pull request Apr 2, 2024

Add packet concealment for opus decoder in audiobridge #3349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add insert of concealed packets #3308

Add insert of concealed packets #3308

spscream commented Dec 13, 2023

januscla commented Dec 13, 2023

spscream Dec 13, 2023 •

edited

lminiero Dec 14, 2023

spscream Dec 14, 2023 •

edited

lminiero left a comment

lminiero Dec 14, 2023

lminiero Dec 14, 2023

spscream Dec 14, 2023

lminiero Dec 14, 2023

lminiero Dec 14, 2023

spscream Dec 14, 2023

lminiero Dec 14, 2023

spscream Dec 14, 2023

lminiero Dec 14, 2023

lminiero Dec 14, 2023

spscream Dec 14, 2023

lminiero commented Dec 18, 2023 •

edited

atoppi commented Dec 18, 2023 •

edited

spscream commented Dec 18, 2023

spscream commented Dec 18, 2023

spscream commented Dec 18, 2023

lminiero commented Dec 20, 2023

spscream commented Dec 20, 2023

lminiero commented Dec 20, 2023

spscream commented Dec 20, 2023

atoppi commented Jan 8, 2024

spscream commented Jan 9, 2024

atoppi commented Jan 10, 2024

spscream commented Jan 15, 2024

spscream commented Feb 6, 2024

spscream commented Feb 6, 2024

atoppi commented Feb 6, 2024 •

edited


		janus_mutex_lock(&participant->qmutex);

		for(int i=1; i < lost_packets; i++) {

Add insert of concealed packets #3308

Add insert of concealed packets #3308

Conversation

spscream commented Dec 13, 2023

januscla commented Dec 13, 2023

spscream Dec 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spscream Dec 14, 2023 • edited

Choose a reason for hiding this comment

lminiero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lminiero commented Dec 18, 2023 • edited

atoppi commented Dec 18, 2023 • edited

spscream commented Dec 18, 2023

spscream commented Dec 18, 2023

spscream commented Dec 18, 2023

lminiero commented Dec 20, 2023

spscream commented Dec 20, 2023

lminiero commented Dec 20, 2023

spscream commented Dec 20, 2023

atoppi commented Jan 8, 2024

spscream commented Jan 9, 2024

atoppi commented Jan 10, 2024

spscream commented Jan 15, 2024

spscream commented Feb 6, 2024

spscream commented Feb 6, 2024

atoppi commented Feb 6, 2024 • edited

spscream Dec 13, 2023 •

edited

spscream Dec 14, 2023 •

edited

lminiero commented Dec 18, 2023 •

edited

atoppi commented Dec 18, 2023 •

edited

atoppi commented Feb 6, 2024 •

edited