Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtitles/dfxp: timecode: examine div tags for timecodes #1136

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

as
Copy link

@as as commented May 13, 2020

The current dfxp parser does not examine the contents of <div tags for timing information. This causes it to generate an empty manifest for some valid dfxp files. It is valid to have timing information in the <div, and I can provide more details and examples on request, but here are two files:

FILE A: http://flv.io/ABC142908_id.dfxp.xml
FILE B: http://flv.io/ABC142908_en-US.dfxp.xml

FILE B currently generates an empty manifest. This change fixes this.

I have made changes to two functions that examine timing information:

  • dfxp_get_duration, which traverses the entire XML to compute the ultimate duration of the subtitle's content.
  • dfxp_parse_frames, which traverses the XML in a similar fashion. It now looks at <p tags and reverts to the last-visited <div if there exists no timing information in the <p tag.

Additionally, there is set of helper functions to remove duplicate code, as reconciling my changes to work inline resulted in cruft. The resulting changes allow FILE B to be parsed, as well as combinations of FILE A and FILE B.

@erankor
Copy link
Contributor

erankor commented May 14, 2020

I'm sorry, but this pull is currently in a state that is very far from what I'm willing to merge.
I'll list some of the issues I'm seeing, but in general - I think this code has to be thoroughly revised to match the coding standards of this module.

Style:

  • missing static on internal functions
  • missing spaces before/after operators
  • * next to the var name instead of type name
  • blocks without {}'s
  • _t suffix for struct typedefs
  • dfxp_parse_timestamp0 is returning the result in two different ways
  • commit history is dirty with unrelated changes and their reverts

Efficiency:

  • no need to store the duration dfxp_timestamp
  • no need to look for duration if we already got end time (or the opposite, whichever we try to fetch first...)
  • dfxp_get_duration is now calculating the start time even though it doesn't care about it

Bug:

  • last_div is not reset when walking up in the tree - meaning - it may fall back to using a div that is not an ancestor of the current p element.

@as
Copy link
Author

as commented May 19, 2020

Thanks for the feedback, please take another look now, @erankor . I updated the pull request with respect to the feedback provided.

The style you provided is not consistent everywhere in this module even before this PR (e.x., * next to the var name instead of type name). I did not fix that, but the changes here in this PR should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants