Fix playback of AAC in TS files #722

lawadr · 2023-10-12T00:26:45Z

When the channel config is 0 in the ADTS header, delay the submission of the format to the output until a Program Config Element (PCE) is found and appended to the format's initialization data.

This reproduces the initialization data payload that is submitted by the same AAC stream when inside other containers such as MP4 or MKV.

Fixes issue #695

google-cla · 2023-10-12T00:26:50Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

rohitjoins

Thank you for the contribution and fixing the bug in the library.

Please clarify some of the minor comments added to the PR and I should be able to merge once they are resolved.

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

rohitjoins · 2023-11-22T15:21:03Z

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

+      }
+
+      // Calculate total PCE size including initial PCE tag and a zero length comment.
+      int pceSize = (pceBuffer.getPosition() + extraBits + 7) / 8 + 1;


Why do we assume a zero length comment here? As per the spec, I cannot see to find such an assumption.

Well, we don't need to copy across the comment so there's no point buffering it up. If we had to copy the comment we'd need to increase AAC_PCE_MAX_SIZE from 49 to 305, which will increase the pceBuffer size.

Having said that, if we increase AAC_PCE_MAX_SIZE to 50 we'd get the byte representing the comment size. We could read that to more robustly check if the PCE fits into the sample in the below conditional. I'll make this change.

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

rohitjoins · 2023-11-22T15:23:56Z

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

@@ -66,6 +67,9 @@ public final class AdtsReader implements ElementaryStreamReader {
  private static final byte[] ID3_IDENTIFIER = {'I', 'D', '3'};
  private static final int VERSION_UNSET = -1;

+  private static final int AAC_PCE_MIN_SIZE = 6;
+  private static final int AAC_PCE_MAX_SIZE = 49;


This values are not defined in the spec, I'm assuming you're calculating these based on min and max of values/fields present in the spec right?

Yes.

The AAC_PCE_MIN_SIZE is the theoretical minimum size a PCE block can be, based on the possible values of fields in the spec. I'm using that to check against the sample size to determine whether the sample could even hold a PCE block. This is for safety. However, it also has the side effect of making sure we never allocate a pceBuffer of less than 6 bytes in size, which happens to be the exact amount we parse before calculating the actual PCE size which we test against the actual sample size.

The AAC_PCE_MAX_SIZE is the theoretical maximum size, minus the 1 byte comment size and possible 255 bytes of comment that follows. I'm using this to limit the amount of data we're buffering up in pceBuffer.

rohitjoins · 2023-11-23T01:06:13Z

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

+
+        System.arraycopy(oldConfig, 0, newConfig, 0, oldConfig.length);
+        pceBuffer.setPosition(3 /* PCE tag */);
+        pceBuffer.readBits(newConfig, oldConfig.length, numPceBits);


While copying the pceBuffer we are copying only until numPceBits but when defining the new configSize we use configSize += (numPceBits + 7) / 8 + 1, why do we need to add zero length comment to the size when we don't copy it?

Should this be simply configSize += (numPceBits + 7) / 8 ?

@lawadr Please see this comment which was added later.

I guess so that we are appending a valid PCE payload to the config. If we leave out the zero comment size byte then something that parses this as a PCE underneath could fail.

Anyway, a better reason is that if you remux the TS exhibiting this issue to an MP4 or MKV, where this whole config (including the PCE part that this PR appends) is stored somewhere in the container's header/metadata, you'll see that this change matches up 1:1 with it. It results in a TS file submitting the exact same config as its original MP4/MKV file.

If you check the original bug report:

TS: {0x11, 0x80} MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}

That last byte would coincide with the comment size byte.

With this PR:

TS: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00} MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}

This example is for that particular media, and it may be the case that the value of commentLength for it is zero. We cannot generalise and write code based on one example right?

The last byte there is definitely the comment size. It's in the ISO 14496-3 (2009) spec. This whole series of bytes that makes up the config is described in Table 1.15 – Syntax of AudioSpecificConfig(). For the AAC-LC object type (which parseAdtsHeader forces the stream to be treated as), the bits after the first 13 are described in Table 4.1 – Syntax of GASpecificConfig(). As you can see, if the channel configuration is 0, the PCE is output, as described in Table 4.2 – Syntax of program_config_element() (identical to the PCE block as defined in the MPEG-2 spec), which includes a pascal string comment at the end.

Are you asking if we should copy across a comment if it does exist and therefore doesn't have a size of 0? There's no harm in doing so, though that will requiring making our pceBuffer over 6x larger to accommodate a potential 255 byte comment.

No, but if we are already reading the commentLength should we at least copy that across?

Or because we don't pass the comments, we should advertise commentLength as 0?

Exactly that. If we copy across the comment length and it isn't 0, then a parser of this data may read that many bytes past the end of the config. In other words, if we don't want to bother copying the comment, we must set the size to 0.

rohitjoins · 2023-11-24T02:53:47Z

Hi @lawadr,

Can we also please add a test for this in AdtsReaderTest.java?

lawadr · 2023-11-26T04:00:40Z

Hi @lawadr,

Can we also please add a test for this in AdtsReaderTest.java?

No problem. I've just pushed a couple.

It needed a second test because the first would succeed with the old behaviour (pre-this PR) as we can't easily assert the submitted format's initializationData. The second ensures that the format is withheld until the PCE is found, so it's a test for failure. This second test would "succeed" with the old code where as it shouldn't now.

When the channel config is 0 in the ADTS header, delay the submission of the format to the output until a Program Config Element (PCE) is found and appended to the format's initialization data. This reproduces the initialization data payload that is submitted by the same AAC stream when inside other containers such as MP4 or MKV.

rohitjoins · 2023-11-27T16:47:11Z

I'm going to send this for internal review now. You may see some more commits being added as I make changes in response to review feedback. Please refrain from pushing any more substantive changes as it will complicate the internal review - thanks!

lawadr · 2023-11-29T21:08:31Z

libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java

+          int offset = data.getPosition();
+          int limit = offset + min(sampleSize, AAC_PCE_MAX_SIZE);
+          ParsableBitArray pceBuffer =
+              new ParsableBitArray(Arrays.copyOfRange(data.getData(), offset, limit));


How does this work when data doesn't contain the entire PCE? If you don't use continueRead to buffer the PCE up then you need some sort of byte-by-byte state tracking inside readAacProgramConfigElement, which you don't have.

What do you mean by entire PCE?

Instead of calling continueRead which moved the data position further ahead to read pceBuffer, we are now copying the pceBuffer from the data without changing its position. Once PCE data is read and appended to the initializationData of format, setReadingState will be triggered which reads the sample and moves the position in data as we would have done when this method was not in place.

We're now copying the data only once here instead of doing it twice in the previous approach. Please explain if you have something else in mind with this comment.

Also instead of relying on IllegalStateException when format is not set for the case when channelConfig is 0 and not enough bytes in PCE, we now throw ParserException as PCE data should be present in this case.

I understand why you've done that as it does simplify the code. It just doesn't work.

Take this test for example:

@Test public void aacPceSplit() throws ParserException { byte[] first = Arrays.copyOf(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1); byte[] second = Arrays.copyOfRange(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1, AAC_PCE_TEST_DATA.length); data = new ParsableByteArray(first); feed(); data = new ParsableByteArray(second); feed(); assertSampleCounts(0, 1); adtsOutput.assertSample(0, AAC_PCE_ADTS_CONTENT, 0, C.BUFFER_FLAG_KEY_FRAME, null); }

This test passes enough data to cover just the ADTS header and the first byte of the PCE, then follows it up with the rest of the PCE. This doesn't work with your change.

Without storing the entire PCE in a buffer (using continueRead), you've lost the ability to break up the PCE into multiple calls to consume.

Imagine getting rid of continueRead from the STATE_READING_ADTS_HEADER case. It would break every time an ADTS header staddles the boundary of multiple consume calls. It's the same issue here but with the PCE.

Thank you for the explanation. Makes sense to me now. I'll also add this test case now.

Do you agree with the change that once readAacProgramConfigElement is called and not enough bits are present or the PCE tag is not present, it is better to throw ParserException instead of silently not setting the format and relying on failing later because of this?

@rohitjoins

Firstly, ignore my prior post as it's full of inaccuracies.

Also based on the spec, the channel configuration is part of the fixed adts header and will be same in all the headers, so if one adts header says channel configuration is 0, the rest of the headers would do the same and it should be followed by PCE data in the raw blocks.

This isn't true. Once a PCE is given inside a raw_data_block, it applies from that point onwards until another PCE is defined inside another raw_data_block. It says so in the specification:

"The configuration indicated by the PCE takes effect at the raw_data_block() containing the
PCE. The number of front, side and back channels as specified in the PCE must be present
in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a
new PCE is transmitted."

Typically, as streams do not tend to change their channel setups, you'll only see one PCE at the very start. This is what I see in my sample. The first block starts with a PCE, while the following blocks jump straight into the SCEs and CPEs.

Can you please shed more light on how this can be legitimately reached? I think it is safe to throw an exception even in this case. Also I couldn't find anything about PCE just present once per file, I think in that case if we start from the middle we're gonna crash anyways right?

So this brings us to TS files, specifically as part of an HLS playlist. If a PCE is present at the start of the first AAC block in a particular TS file, that TS file can be played in isolation and used as the starting point of the HLS.

You're right that if a PCE has not been encountered then samples cannot be decoded and we should throw. However, the issue is that we are determining whether to enter readAacProgramConfigElement based on the hasOutputFormat member of the AdtsReader class. The problem with that is if you play an HLS from the start and skip far enough ahead to another TS file, a new AdtsReader will be constructed and hasOutputFormat will be initialised to false again, even though we'll be reusing the previously created TrackOutput that has already received the format from before. In other words, I think we need to base whether to go into readAacProgramConfigElement on whether the currentOutput has previously received a format.

Thank you for the explanation. Can you please add a commit which makes the changes you suggested above?

I guess in that case we will be entering readAacProgramConfigElement only when we expect PCE data to be present and can throw an error when non-PCE tag (!= 5) is encountered.

Taking a closer look and discussing it internally, it seems you can't really check if currentOutput has a set Format or not.

The configuration indicated by the PCE takes effect at the raw_data_block() containing the PCE. The number of front, side and back channels as specified in the PCE must be present in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a new PCE is transmitted.

The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n number of raw blocks. Ts container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.

To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.

Taking a closer look and discussing it internally, it seems you can't really check if currentOutput has a set Format or not.

Yes. This is a problem.

The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n number of raw blocks. Ts container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.

It does have a header and a number of raw blocks. You can see that in adts_frame in the spec. There's also something defined above adts_frame though, and that's adts_sequence which loosely defines a series of frames. I can see nothing that indicates a PCE only applies to the end of its adts_frame instead of adts_sequence.

To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.

I've got a test HLS here that exhibits this behaviour. It's 1.2gb though so I'm not sure how to send it across.

If you have ffmpeg you can create it yourself in no time:

Download an MP4 of Blender's Sprite Fright short: https://dl.acm.org/doi/10.1145/3550339.3556000

ffmpeg -i <input.mp4> -c:v copy -c:a aac -aac_pce 1 -f hls -hls_list_size 0 <output.m3u8>

This HLS plays fine in VLC, for example. You can play it from the beginning, skip forward and backwards. You can even start playback from a position that has no PCE using the --start-time= command line argument. I guess it either always checks the first TS file in the playlist to find the PCE or it works out the channel layout implicitly (see 8.5.2.3 Implicit channel mapping in the spec)?

If we throw whenever we can't find a PCE then you can only play the stream from the start and can never skip forward or backwards. If we don't throw (or somehow check whether currentOutput already has a format) then you can skip forward and backwards as long as the stream was played from the start. You still won't be able to start part way through the stream unless it's reencoded to replicate the PCE in every frame, we parse the stream from the start until the PCE is found, or we implicitly work out the PCE as described in the spec.

rohitjoins · 2023-12-21T02:59:51Z

Hi @lawadr,

I created the HLS file using the command stated above. Here are my observations:

when -aac_pce 1 is used, PCE data is only appended to the first ADTS frame

using ffprobe for first ts file, produces output without any error:

but doing the same for the second ts file produced gives an error:
similarly you can use mediainfo on these ts files and can see that the channel layout is only shown for the first file but not the second file.
if you remove the -aac_pce 1 when encoding all ts files contains the channel layout information

Found a similar issue reported on FFmpeg: https://trac.ffmpeg.org/ticket/6974

Is there a use case where media files are encoded using the command above?

I'll try and discuss internally if it is fine to not throw an error when a PCE data is already read before as it is may not be mandatory to send PCE data in each frame. Although that will still mean starting the stream from middle (not start) would still fail the playback.

lawadr · 2023-12-27T17:07:52Z

Is there a use case where media files are encoded using the command above?

Not really, because the example command above is to force a PCE for a standard 2.1 audio stream for testing/reproduction purposes. Typically you would see a PCE when the standard 7 channel types enumerated in the ADTS header aren't sufficient. In the original sample stream I provided, the channel setup was:

1x front SCE
1x front CPE
2x back CPE
1x LFE

I've no idea why the second CPE isn't a side CPE (and therefore standard 7.1) as I didn't author the sample, but nevertheless, the likes of VLC handles it fine, and with the fix in this PR so does ExoPlayer+MediaCodec.

Although that will still mean starting the stream from middle (not start) would still fail the playback.

Yes. This would unfortunately be the case. Perhaps this could be for later enhancement.

For my use case, being able to play from the beginning of the stream is vital. Being able to jump 10s back when you miss something is almost as vital. Being able to start from the middle isn't the most important thing as you can just start from the beginning and skip ahead to where you were.

tianyif assigned rohitjoins Oct 12, 2023

rohitjoins force-pushed the aac-ts-pce-fix branch from 1141d35 to 9fc4438 Compare November 22, 2023 15:04

rohitjoins reviewed Nov 22, 2023

View reviewed changes

rohitjoins added the pending comments label Nov 22, 2023

rohitjoins reviewed Nov 23, 2023

View reviewed changes

rohitjoins added should merge and removed pending comments labels Nov 24, 2023

rohitjoins force-pushed the aac-ts-pce-fix branch 2 times, most recently from acdc660 to 70d4eb5 Compare November 27, 2023 16:40

lawadr and others added 6 commits November 27, 2023 16:41

Format with google-java-format

0617ee2

Fix based on feedback

333e4fe

Add tests

cba2acd

Format and comment changes

cd39812

Add release note

ff515d3

rohitjoins force-pushed the aac-ts-pce-fix branch from 70d4eb5 to ff515d3 Compare November 27, 2023 16:45

rohitjoins force-pushed the aac-ts-pce-fix branch from b2c1100 to 9bb0004 Compare November 28, 2023 17:32

Add nullness check

2c3eac4

rohitjoins force-pushed the aac-ts-pce-fix branch 5 times, most recently from 83cc9e2 to 6d1951a Compare November 29, 2023 18:37

lawadr commented Nov 29, 2023

View reviewed changes

rohitjoins force-pushed the aac-ts-pce-fix branch 2 times, most recently from 6a1649d to 3adabe8 Compare November 29, 2023 22:33

rohitjoins force-pushed the aac-ts-pce-fix branch from 3adabe8 to 5a446c2 Compare November 30, 2023 00:45

Changes based on internal review

1d573e0

rohitjoins force-pushed the aac-ts-pce-fix branch from 5a446c2 to 1d573e0 Compare November 30, 2023 00:55

rohitjoins added need more info and removed should merge labels Dec 4, 2023

google-oss-bot added the stale label Dec 15, 2023

androidx deleted a comment from google-oss-bot Dec 15, 2023

rohitjoins added pending comments and removed need more info stale labels Dec 15, 2023

lawadr mentioned this pull request Feb 5, 2024

AAC with channel config 0 fails to play from TS files #695

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix playback of AAC in TS files #722

Fix playback of AAC in TS files #722

lawadr commented Oct 12, 2023

google-cla bot commented Oct 12, 2023

rohitjoins left a comment

rohitjoins Nov 22, 2023

lawadr Nov 23, 2023

rohitjoins Nov 22, 2023

lawadr Nov 23, 2023

rohitjoins Nov 23, 2023

rohitjoins Nov 23, 2023

lawadr Nov 23, 2023 •

edited

rohitjoins Nov 23, 2023 •

edited

lawadr Nov 23, 2023

rohitjoins Nov 23, 2023

rohitjoins Nov 23, 2023

lawadr Nov 23, 2023

rohitjoins commented Nov 24, 2023

lawadr commented Nov 26, 2023

rohitjoins commented Nov 27, 2023

lawadr Nov 29, 2023

rohitjoins Nov 29, 2023

rohitjoins Nov 29, 2023

lawadr Nov 29, 2023 •

edited

rohitjoins Nov 29, 2023 •

edited

lawadr Dec 5, 2023 •

edited

rohitjoins Dec 5, 2023

rohitjoins Dec 5, 2023

rohitjoins Dec 6, 2023

lawadr Dec 8, 2023

rohitjoins commented Dec 21, 2023

lawadr commented Dec 27, 2023

Fix playback of AAC in TS files #722

Are you sure you want to change the base?

Fix playback of AAC in TS files #722

Conversation

lawadr commented Oct 12, 2023

google-cla bot commented Oct 12, 2023

rohitjoins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lawadr Nov 23, 2023 • edited

Choose a reason for hiding this comment

rohitjoins Nov 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohitjoins commented Nov 24, 2023

lawadr commented Nov 26, 2023

rohitjoins commented Nov 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lawadr Nov 29, 2023 • edited

Choose a reason for hiding this comment

rohitjoins Nov 29, 2023 • edited

Choose a reason for hiding this comment

lawadr Dec 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohitjoins commented Dec 21, 2023

lawadr commented Dec 27, 2023

lawadr Nov 23, 2023 •

edited

rohitjoins Nov 23, 2023 •

edited

lawadr Nov 29, 2023 •

edited

rohitjoins Nov 29, 2023 •

edited

lawadr Dec 5, 2023 •

edited