Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timestamps across concatenated TSes #906

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dsparano
Copy link
Contributor

Problem
When a playlist of TS assets is played, at the point of stream change the following TS may have one or more frame timestamps repeated. See this example with the last 5 frames of the previous TS and the first 5 frames of the next one.
Current main:

onQueueInputBuffer timeUs=1000004000000 size=10647
onQueueInputBuffer timeUs=1000003920000 size=5095
onQueueInputBuffer timeUs=1000003880000 size=1630
onQueueInputBuffer timeUs=1000003960000 size=1728
onQueueInputBuffer timeUs=1000004040000 size=3313
onQueueInputBuffer timeUs=1000004000000 size=951 KEY
onQueueInputBuffer timeUs=1000004080000 size=45
onQueueInputBuffer timeUs=1000004040000 size=43
onQueueInputBuffer timeUs=1000004120000 size=132
onQueueInputBuffer timeUs=1000004160000 size=241

Correct sequence should be:

onQueueInputBuffer timeUs=1000004000000 size=10647
onQueueInputBuffer timeUs=1000003920000 size=5095
onQueueInputBuffer timeUs=1000003880000 size=1630
onQueueInputBuffer timeUs=1000003960000 size=1728
onQueueInputBuffer timeUs=1000004040000 size=3313
onQueueInputBuffer timeUs=1000004080000 size=951 KEY
onQueueInputBuffer timeUs=1000004160000 size=45
onQueueInputBuffer timeUs=1000004120000 size=43
onQueueInputBuffer timeUs=1000004200000 size=132
onQueueInputBuffer timeUs=1000004240000 size=241

Time stamps us 1000004000000 and 1000004040000 get repeated. This has an impact in applications where timestamp accuracy is required.

Root cause
From our analysis the root cause seems to be the calculated asset duration. The duration of a Transport Stream asset gets calculated as difference between the very last PCR and the very first PCR encountered in the stream. However this seems to bring some issues:

  1. The calculation is likely frame inaccurate, because there is no guarantee that the first and last PCRs happen on the very first and the very last frame. As far as we can read, ISO/IEC 13818-1 does not seem to mandate PCRs being repeated for each Access Unit. For ex, it's easy to find TS assets where the last seen PCR is in the frame before the last one, ditto for the first one.
  2. The calculation misses the duration of the very first frame, otherwise for ex. for a two frames video the duration would end up being just one frame duration.

Proposed solution
As a mitigation we propose here a change to use DTS, in pretty much the same way. Since the DTS is monotonic it allows to easily calculate both the frame duration (point 2) and the asset duration. Specifically: frame duration as difference between two consecutive DTSes (2nd and 1st) and asset duration as difference between last DTS and first DTS, and corrected by adding one frame duration.

In the unlikely event the DTS is not available, this PR would still fall back to using the PCR as in current main.

@rohitjoins rohitjoins self-assigned this Dec 20, 2023
@dsparano
Copy link
Contributor Author

Hi @rohitjoins, is there any additional info I can provide to help you reviewing this PR?

@rohitjoins
Copy link
Contributor

Hi @dsparano, I haven't got time yet to look at this yet. Will review this soon and let you know. Thank you for the patience.

@rohitjoins
Copy link
Contributor

Hi - would it be possible to open a new PR from an individual-owned fork? We can't push changes to organization-owned forks like this one unless we have collaborator access. If that's not possible then we can still merge this PR but it will result in an 'evil' merge. See more info here: https://github.com/androidx/media/blob/release/CONTRIBUTING.md#push-access-to-pr-branches

@dsparano
Copy link
Contributor Author

@rohitjoins I've given you write access on our forked repo, please let me know if you cannot write.

Copy link
Contributor

@rohitjoins rohitjoins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the TsExtractorTest tests are failing after this change. The duration for the sample files in the dump files shows quite an increase. Can you also look into that?

* end. This class can only be used once to read duration from a given stream, and the usage of the
* <p>This reader extracts the duration by reading DTS values of the first PES PID packets at the
* start and at the end of the stream, calculating the difference, and converting that into stream
* duration. Should DTS not be available this reader falls back to using PCR in the same way and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back to using PCR in the same way

I think you should leave the explanation from before about how it is done. Same way will not be known to reader of this documentation in the future unless you're seeing the diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the description, although this is now a private method

firstDtsValue = C.TIME_UNSET;
secondDtsValue = C.TIME_UNSET;
lastDtsValue = C.TIME_UNSET;
firstDtsPosition = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use C.POSITION_UNSET instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, however C.POSITION_UNSET is deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for suggesting a deprecated constant. Please use C.INDEX_UNSET as documented on the deprecated constant C.POSITION_UNSET.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob, changed

return finishReadDuration(input);
}

public @Extractor.ReadResult int computeDuration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation to this new method as done for readDuration as this is a public method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as per below comment

return finishReadDuration(input);
}

public @Extractor.ReadResult int computeDuration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name computeDuration makes little sense when we already have another called readDuration. Use a better name here and may be also change the method name for readDuration to something like readDurationFromPcr and readDurationFromDts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. So I've changed the two methods as readDurationFromPcr and readDurationFromDts, however having added the fallback these two can now be private and the readDuration is now the public one handling the fallback.

return durationReader.readDuration(input, seekPosition, pcrPid);
int pesPid = getFirstPesReaderPid();
if (canReadDuration && !durationReader.isDurationReadFinished() && pesPid > 0) {
return durationReader.computeDuration(input, seekPosition, pesPid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the fallback logic to use readDuration as before when DTS data is not available? Also why is it more accurate to use DTS data to calculate the duration rather than the PCR apart from the use case mentioned in the issue description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the fallback in readDuration. The problem with the PCR is what I mentioned in the PR description, will elaborate more in there.

@rohitjoins
Copy link
Contributor

Repeating again, please check the failing test cases in TsExtractorTest. Dump files shows a change in duration for the sample files, how do we explain the change and was it more accurate to derive the duration from PCR or DTS?

@dsparano
Copy link
Contributor Author

Re the TsDurationReaderTest.

I've checked the test TS bbb_2500ms.ts using the DVB Inspector analyser:

  • First PCR: 0x1206420 (18900000) => 0:00:00.700000 appears at the very first frame
  • Last PCR: 0x5265C00 (86400000) => 0:00:03.200000 appears at the penultimate frame
    difference is 2.5 secs.

However:

  • Frame count: 58
  • Fps: 24
  • As a result the real duration is 24/58 secs = 2.416666 secs

To complicate things the DTS and PTS values are a bit strange:

  • First encoded frame:
    pts: 0x21822 (137250) => 0:00:01.5250
    dts: 0x1EC30 (126000) => 0:00:01.4000
  • Second encoded frame:
    pts: 0x24414 (148500) => 0:00:01.6500
    dts: 0x21822 (137250) => 0:00:01.5250
  • Last encoded frame:
    pts: 0x57864 (358500) => 0:00:03.9833
    dts: 0x569BE (354750) => 0:00:03.9416

(Note: the last encoded frame has the highest PTS so it it the last also in presentation order)

The PTS/DTS are on a 90KHz tick so given that the stream is 24fps this gives a 90000/24 = 3750 ticks/frame.

What happens with this encode is that the last-first DTS diff is 354750-126000 = 228750 ticks equal to 61 frames, plus the last (or first depending how you see it) this gives 62 frames, which seems wrong.

Also, this PR assumes equally spaced DTS (which seems a fair assumption) while the diff second-first DTS here is 137250-126000 = 11250 which is 3 frames duration.

Long story short I'm not entirely sure this is a fully conformant stream, or one meaningful to be used for testing.

@dsparano
Copy link
Contributor Author

On the other hand if we take the test file sample_h264.ts:

  • First PCR: 0x1206420 (18900000) => 0:00:00.700000 appears at the very first frame

  • Last PCR: 0x2932E00 (43200000) => 0:00:01.600000 appears at two frames behind the last one (i.e. index 27)
    difference is 0.9 secs

  • Frame count: 30

  • Fps: 30

  • As a result the real duration is 30/30 secs = 1 sec

Timestamps:

  • First encoded frame:
    pts: 0x203A0 (132000) => 0:00:01.4666
    dts: 0x1EC30 (126000) => 0:00:01.4000
  • Second encoded frame:
    pts: 0x20F58 (135000) => 0:00:01.5000
    dts: 0x1F7E8 (129000) => 0:00:01.4333
  • Last encoded frame:
    pts: 0x35778 (219000) => 0:00:02.4333
    dts: 0x34008 (213000) => 0:00:02.3666

Therefore:
DTS_1 - DTS_0 = 129000 - 126000 = 3000
DTS_29 - DTS_0 = 213000 - 126000 = 87000

Duration based on DTS: 87000 + 3000 = 90000 which is 1 sec, which is correct.

@dsparano
Copy link
Contributor Author

Not sure how the original bbb_2500ms.ts was encoded, anyway I just did:

ffmpeg -i https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_stereo.avi -t 00:00:02.500 bbb_2500ms.ts

and the resulting asset is totally equivalent with the current one, but with correct frame count and timestamps, so I've added it to this PR.

I've also added .gitattributes files where needed to avoid false negatives on tests running in Windows, due to the different line endings.

I'm currently looking at the tests, many dumps need updating because of the different durations we now get. There are a few failings I'm currently analysing.

@FongMi
Copy link

FongMi commented Jan 24, 2024

Will this fix reduce the chance of "Unexpected audio track timestamp discontinuity" being thrown?
test link
need to set mineType to hls
about watch 30 sec will see error

@dsparano
Copy link
Contributor Author

@FongMi I'm struggling to find a device supporting audio mpeg-L2, any suggestion?

@FongMi
Copy link

FongMi commented Jan 24, 2024

@FongMi I'm struggling to find a device supporting audio mpeg-L2, any suggestion?

Sorry, I don't have suggestion, I use Xiaomi 13...
Screenshot_2024-01-24-23-21-54-114_com.fongmi.android.tv.jpg

@dsparano
Copy link
Contributor Author

@FongMi could play your content on a Samsung S21, the Unexpected audio track timestamp discontinuity is still there :( it seems more an encoder issue to me as there are jumps in audio presentation timestamps.

@dsparano dsparano force-pushed the dsparano-exo127 branch 2 times, most recently from bfb73f4 to 5096fba Compare January 24, 2024 16:58
@dsparano
Copy link
Contributor Author

@rohitjoins I've done a few updates and refreshed many dump files. Tests are all passing now, if you could please check the changes.

@rohitjoins
Copy link
Contributor

Hi @dsparano,

Thank you for addressing the comments. Please be patient as this is a big change. Will get this reviewed internally if I don't find anything to flag before that.

Note: I'm going on vacation for 3 weeks soon, so expect a delay.

@dsparano
Copy link
Contributor Author

Hi @rohitjoins have you had the chance to make progress with this review?

@dsparano
Copy link
Contributor Author

Hi @rohitjoins I've rebased and updated a few dump files, can you please resume your review?

@rohitjoins
Copy link
Contributor

Hi @dsparano,

I'll take a look again on this PR and will let you know if there are any further concerns to merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants