Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTML text parsing issues with new lines #77

Open
shlompy opened this issue Jan 24, 2023 · 1 comment
Open

TTML text parsing issues with new lines #77

shlompy opened this issue Jan 24, 2023 · 1 comment

Comments

@shlompy
Copy link

shlompy commented Jan 24, 2023

Hi.

I'm trying to parse the following ttml snippet:

<?xml version="1.0" encoding="UTF-8"?><tt xmlns:smpte="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling" xml:space="default" xml:lang="eng"><head>
    <metadata>
      <ttm:title/>
    </metadata>
    <styling>
<style xml:id="style.center.outline" xmlns:tts="http://www.w3.org/ns/ttml#style" tts:fontFamily="Arial" tts:fontSize="100%" tts:fontStyle="normal" tts:fontWeight="normal" tts:backgroundColor="transparent" tts:color="white" tts:textOutline="black 2px" tts:textAlign="center"/>
    </styling>
    <layout>
      <region xml:id="r0" tts:displayAlign="after" tts:origin="10% 75%" tts:extent="80% 20%"/>
    </layout>
  </head><body>
  <div>
  <p style="style.center.outline" begin="00:22:31.000" region="r0" xml:id="p264" end="00:22:33.720" ><span tts:direction="ltr">Got you!<br/>Steady on.</span></p>
  </div></body></tt>


It seems that the subtitle text is parsed without a new line.
The text is unmarshalled as xml chardata:

type TTMLInItem struct {
	Text string `xml:",chardata"`
...
}

Which results with the following string: "Got you!Steady on."

ttml.go has the following comment in the code:

// New line decoded as a line break. This can happen if there's a "br" tag within the text since
// since the go xml unmarshaler will unmarshal a "br" tag as a line break if the field has the
// chardata xml tag.

But it doesn't really seem the go xml unmarshaler converts the br tag into a new line.
Perhaps this is something which used to be true in old go versions? (I'm using Go 1.18.5

@asticode
Copy link
Owner

Problem is that this lib apparently doesn't handle properly <br/> inside <span> tags.

I'm welcoming PRs.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants