Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix playback of AAC in TS files #722

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lawadr
Copy link
Contributor

@lawadr lawadr commented Oct 12, 2023

When the channel config is 0 in the ADTS header, delay the submission of the format to the output until a Program Config Element (PCE) is found and appended to the format's initialization data.

This reproduces the initialization data payload that is submitted by the same AAC stream when inside other containers such as MP4 or MKV.

Fixes issue #695

@google-cla
Copy link

google-cla bot commented Oct 12, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Contributor

@rohitjoins rohitjoins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution and fixing the bug in the library.

Please clarify some of the minor comments added to the PR and I should be able to merge once they are resolved.

}

// Calculate total PCE size including initial PCE tag and a zero length comment.
int pceSize = (pceBuffer.getPosition() + extraBits + 7) / 8 + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we assume a zero length comment here? As per the spec, I cannot see to find such an assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't need to copy across the comment so there's no point buffering it up. If we had to copy the comment we'd need to increase AAC_PCE_MAX_SIZE from 49 to 305, which will increase the pceBuffer size.

Having said that, if we increase AAC_PCE_MAX_SIZE to 50 we'd get the byte representing the comment size. We could read that to more robustly check if the PCE fits into the sample in the below conditional. I'll make this change.

@@ -66,6 +67,9 @@ public final class AdtsReader implements ElementaryStreamReader {
private static final byte[] ID3_IDENTIFIER = {'I', 'D', '3'};
private static final int VERSION_UNSET = -1;

private static final int AAC_PCE_MIN_SIZE = 6;
private static final int AAC_PCE_MAX_SIZE = 49;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This values are not defined in the spec, I'm assuming you're calculating these based on min and max of values/fields present in the spec right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The AAC_PCE_MIN_SIZE is the theoretical minimum size a PCE block can be, based on the possible values of fields in the spec. I'm using that to check against the sample size to determine whether the sample could even hold a PCE block. This is for safety. However, it also has the side effect of making sure we never allocate a pceBuffer of less than 6 bytes in size, which happens to be the exact amount we parse before calculating the actual PCE size which we test against the actual sample size.

The AAC_PCE_MAX_SIZE is the theoretical maximum size, minus the 1 byte comment size and possible 255 bytes of comment that follows. I'm using this to limit the amount of data we're buffering up in pceBuffer.


System.arraycopy(oldConfig, 0, newConfig, 0, oldConfig.length);
pceBuffer.setPosition(3 /* PCE tag */);
pceBuffer.readBits(newConfig, oldConfig.length, numPceBits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While copying the pceBuffer we are copying only until numPceBits but when defining the new configSize we use configSize += (numPceBits + 7) / 8 + 1, why do we need to add zero length comment to the size when we don't copy it?

Should this be simply configSize += (numPceBits + 7) / 8 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lawadr Please see this comment which was added later.

Copy link
Contributor Author

@lawadr lawadr Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so that we are appending a valid PCE payload to the config. If we leave out the zero comment size byte then something that parses this as a PCE underneath could fail.

Anyway, a better reason is that if you remux the TS exhibiting this issue to an MP4 or MKV, where this whole config (including the PCE part that this PR appends) is stored somewhere in the container's header/metadata, you'll see that this change matches up 1:1 with it. It results in a TS file submitting the exact same config as its original MP4/MKV file.

If you check the original bug report:

TS:  {0x11, 0x80}
MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}

That last byte would coincide with the comment size byte.

With this PR:

TS:  {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}
MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is for that particular media, and it may be the case that the value of commentLength for it is zero. We cannot generalise and write code based on one example right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last byte there is definitely the comment size. It's in the ISO 14496-3 (2009) spec. This whole series of bytes that makes up the config is described in Table 1.15 – Syntax of AudioSpecificConfig(). For the AAC-LC object type (which parseAdtsHeader forces the stream to be treated as), the bits after the first 13 are described in Table 4.1 – Syntax of GASpecificConfig(). As you can see, if the channel configuration is 0, the PCE is output, as described in Table 4.2 – Syntax of program_config_element() (identical to the PCE block as defined in the MPEG-2 spec), which includes a pascal string comment at the end.

Are you asking if we should copy across a comment if it does exist and therefore doesn't have a size of 0? There's no harm in doing so, though that will requiring making our pceBuffer over 6x larger to accommodate a potential 255 byte comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but if we are already reading the commentLength should we at least copy that across?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or because we don't pass the comments, we should advertise commentLength as 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly that. If we copy across the comment length and it isn't 0, then a parser of this data may read that many bytes past the end of the config. In other words, if we don't want to bother copying the comment, we must set the size to 0.

@rohitjoins
Copy link
Contributor

Hi @lawadr,

Can we also please add a test for this in AdtsReaderTest.java?

@lawadr
Copy link
Contributor Author

lawadr commented Nov 26, 2023

Hi @lawadr,

Can we also please add a test for this in AdtsReaderTest.java?

No problem. I've just pushed a couple.

It needed a second test because the first would succeed with the old behaviour (pre-this PR) as we can't easily assert the submitted format's initializationData. The second ensures that the format is withheld until the PCE is found, so it's a test for failure. This second test would "succeed" with the old code where as it shouldn't now.

@rohitjoins rohitjoins force-pushed the aac-ts-pce-fix branch 2 times, most recently from acdc660 to 70d4eb5 Compare November 27, 2023 16:40
lawadr and others added 6 commits November 27, 2023 16:41
When the channel config is 0 in the ADTS header, delay the submission of
the format to the output until a Program Config Element (PCE) is found
and appended to the format's initialization data.

This reproduces the initialization data payload that is submitted by the
same AAC stream when inside other containers such as MP4 or MKV.
@rohitjoins
Copy link
Contributor

I'm going to send this for internal review now. You may see some more commits being added as I make changes in response to review feedback. Please refrain from pushing any more substantive changes as it will complicate the internal review - thanks!

@rohitjoins rohitjoins force-pushed the aac-ts-pce-fix branch 5 times, most recently from 83cc9e2 to 6d1951a Compare November 29, 2023 18:37
int offset = data.getPosition();
int limit = offset + min(sampleSize, AAC_PCE_MAX_SIZE);
ParsableBitArray pceBuffer =
new ParsableBitArray(Arrays.copyOfRange(data.getData(), offset, limit));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work when data doesn't contain the entire PCE? If you don't use continueRead to buffer the PCE up then you need some sort of byte-by-byte state tracking inside readAacProgramConfigElement, which you don't have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by entire PCE?

Instead of calling continueRead which moved the data position further ahead to read pceBuffer, we are now copying the pceBuffer from the data without changing its position. Once PCE data is read and appended to the initializationData of format, setReadingState will be triggered which reads the sample and moves the position in data as we would have done when this method was not in place.

We're now copying the data only once here instead of doing it twice in the previous approach. Please explain if you have something else in mind with this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also instead of relying on IllegalStateException when format is not set for the case when channelConfig is 0 and not enough bytes in PCE, we now throw ParserException as PCE data should be present in this case.

Copy link
Contributor Author

@lawadr lawadr Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you've done that as it does simplify the code. It just doesn't work.

Take this test for example:

  @Test
  public void aacPceSplit() throws ParserException {
    byte[] first = Arrays.copyOf(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1);
    byte[] second = Arrays.copyOfRange(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1, AAC_PCE_TEST_DATA.length);

    data = new ParsableByteArray(first);
    feed();
    data = new ParsableByteArray(second);
    feed();

    assertSampleCounts(0, 1);
    adtsOutput.assertSample(0, AAC_PCE_ADTS_CONTENT, 0, C.BUFFER_FLAG_KEY_FRAME, null);
  }

This test passes enough data to cover just the ADTS header and the first byte of the PCE, then follows it up with the rest of the PCE. This doesn't work with your change.

Without storing the entire PCE in a buffer (using continueRead), you've lost the ability to break up the PCE into multiple calls to consume.

Imagine getting rid of continueRead from the STATE_READING_ADTS_HEADER case. It would break every time an ADTS header staddles the boundary of multiple consume calls. It's the same issue here but with the PCE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. Makes sense to me now. I'll also add this test case now.

Do you agree with the change that once readAacProgramConfigElement is called and not enough bits are present or the PCE tag is not present, it is better to throw ParserException instead of silently not setting the format and relying on failing later because of this?

Copy link
Contributor Author

@lawadr lawadr Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitjoins

Firstly, ignore my prior post as it's full of inaccuracies.

Also based on the spec, the channel configuration is part of the fixed adts header and will be same in all the headers, so if one adts header says channel configuration is 0, the rest of the headers would do the same and it should be followed by PCE data in the raw blocks.

This isn't true. Once a PCE is given inside a raw_data_block, it applies from that point onwards until another PCE is defined inside another raw_data_block. It says so in the specification:

"The configuration indicated by the PCE takes effect at the raw_data_block() containing the
PCE. The number of front, side and back channels as specified in the PCE must be present
in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a
new PCE is transmitted."

Typically, as streams do not tend to change their channel setups, you'll only see one PCE at the very start. This is what I see in my sample. The first block starts with a PCE, while the following blocks jump straight into the SCEs and CPEs.

Can you please shed more light on how this can be legitimately reached? I think it is safe to throw an exception even in this case. Also I couldn't find anything about PCE just present once per file, I think in that case if we start from the middle we're gonna crash anyways right?

So this brings us to TS files, specifically as part of an HLS playlist. If a PCE is present at the start of the first AAC block in a particular TS file, that TS file can be played in isolation and used as the starting point of the HLS.

You're right that if a PCE has not been encountered then samples cannot be decoded and we should throw. However, the issue is that we are determining whether to enter readAacProgramConfigElement based on the hasOutputFormat member of the AdtsReader class. The problem with that is if you play an HLS from the start and skip far enough ahead to another TS file, a new AdtsReader will be constructed and hasOutputFormat will be initialised to false again, even though we'll be reusing the previously created TrackOutput that has already received the format from before. In other words, I think we need to base whether to go into readAacProgramConfigElement on whether the currentOutput has previously received a format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. Can you please add a commit which makes the changes you suggested above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in that case we will be entering readAacProgramConfigElement only when we expect PCE data to be present and can throw an error when non-PCE tag (!= 5) is encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a closer look and discussing it internally, it seems you can't really check if currentOutput has a set Format or not.

The configuration indicated by the PCE takes effect at the raw_data_block() containing the PCE. The number of front, side and back channels as specified in the PCE must be present in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a new PCE is transmitted.

The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n number of raw blocks. Ts container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.

To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a closer look and discussing it internally, it seems you can't really check if currentOutput has a set Format or not.

Yes. This is a problem.

The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n number of raw blocks. Ts container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.

It does have a header and a number of raw blocks. You can see that in adts_frame in the spec. There's also something defined above adts_frame though, and that's adts_sequence which loosely defines a series of frames. I can see nothing that indicates a PCE only applies to the end of its adts_frame instead of adts_sequence.

To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.

I've got a test HLS here that exhibits this behaviour. It's 1.2gb though so I'm not sure how to send it across.

If you have ffmpeg you can create it yourself in no time:

  1. Download an MP4 of Blender's Sprite Fright short: https://dl.acm.org/doi/10.1145/3550339.3556000
  2. ffmpeg -i <input.mp4> -c:v copy -c:a aac -aac_pce 1 -f hls -hls_list_size 0 <output.m3u8>

This HLS plays fine in VLC, for example. You can play it from the beginning, skip forward and backwards. You can even start playback from a position that has no PCE using the --start-time= command line argument. I guess it either always checks the first TS file in the playlist to find the PCE or it works out the channel layout implicitly (see 8.5.2.3 Implicit channel mapping in the spec)?

If we throw whenever we can't find a PCE then you can only play the stream from the start and can never skip forward or backwards. If we don't throw (or somehow check whether currentOutput already has a format) then you can skip forward and backwards as long as the stream was played from the start. You still won't be able to start part way through the stream unless it's reencoded to replicate the PCE in every frame, we parse the stream from the start until the PCE is found, or we implicitly work out the PCE as described in the spec.

@rohitjoins rohitjoins force-pushed the aac-ts-pce-fix branch 2 times, most recently from 6a1649d to 3adabe8 Compare November 29, 2023 22:33
@rohitjoins
Copy link
Contributor

Hi @lawadr,

I created the HLS file using the command stated above. Here are my observations:

when -aac_pce 1 is used, PCE data is only appended to the first ADTS frame

  • using ffprobe for first ts file, produces output without any error:
    Screenshot 2023-12-21 at 02 49 02
    but doing the same for the second ts file produced gives an error:
    Screenshot 2023-12-21 at 02 51 19
  • similarly you can use mediainfo on these ts files and can see that the channel layout is only shown for the first file but not the second file.
  • if you remove the -aac_pce 1 when encoding all ts files contains the channel layout information

Found a similar issue reported on FFmpeg: https://trac.ffmpeg.org/ticket/6974

Is there a use case where media files are encoded using the command above?

I'll try and discuss internally if it is fine to not throw an error when a PCE data is already read before as it is may not be mandatory to send PCE data in each frame. Although that will still mean starting the stream from middle (not start) would still fail the playback.

@lawadr
Copy link
Contributor Author

lawadr commented Dec 27, 2023

Is there a use case where media files are encoded using the command above?

Not really, because the example command above is to force a PCE for a standard 2.1 audio stream for testing/reproduction purposes. Typically you would see a PCE when the standard 7 channel types enumerated in the ADTS header aren't sufficient. In the original sample stream I provided, the channel setup was:

  • 1x front SCE
  • 1x front CPE
  • 2x back CPE
  • 1x LFE

I've no idea why the second CPE isn't a side CPE (and therefore standard 7.1) as I didn't author the sample, but nevertheless, the likes of VLC handles it fine, and with the fix in this PR so does ExoPlayer+MediaCodec.

Although that will still mean starting the stream from middle (not start) would still fail the playback.

Yes. This would unfortunately be the case. Perhaps this could be for later enhancement.

For my use case, being able to play from the beginning of the stream is vital. Being able to jump 10s back when you miss something is almost as vital. Being able to start from the middle isn't the most important thing as you can just start from the beginning and skip ahead to where you were.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants