Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translate_text corrupts HTML #72

Open
pbtsrc opened this issue Apr 28, 2023 · 3 comments
Open

translate_text corrupts HTML #72

pbtsrc opened this issue Apr 28, 2023 · 3 comments

Comments

@pbtsrc
Copy link

pbtsrc commented Apr 28, 2023

text=

<html>
<body>
  <div>
    <a href="01.html">Chapter I. Margaret Makes Herself at Home</a>
  </div>
  <div>
    <a href="02.html">Chapter II. Stephen's Life Goes On</a>
  </div>
</body>
</html>

translate_text(text, source_lang='EN', target_lang='DE', tag_handling='html') for the above text returns this:

<html>
<body>
 <div>
  <a href="01.html">Kapitel I. Margaret macht es sich gemüt</a>lich  </div>
 <div>
  <a href="02.html">Kapitel II. Stephens Leben geht</a>weiter  </div>
</body>
</html>

As you can see the content of <a> has lost its tail (lich, weiter).
If we use tag_handling='xml' all works as expected:

<html>
<body>
  <div>
    <a href="01.html">Kapitel I. Margaret macht es sich gemütlich</a>
  </div>
  <div>
    <a href="02.html">Kapitel II. Stephens Leben geht weiter</a>
  </div>
</body>
</html>

If we replace <div> with <p> there will be no issue either.

@pbtsrc
Copy link
Author

pbtsrc commented Apr 28, 2023

Another example.
text=

<p>1-<i>London, Paris</i></p>

translate_text returns:

<p>1-London<i>, Paris</i></p>

Same result with tag_handling='html' and tag_handling='xml'

@seekuehe
Copy link

seekuehe commented Jun 9, 2023

@pbtsrc By chance, are you using both tag_handling and preserve_formatting parameters?

@pbtsrc
Copy link
Author

pbtsrc commented Jun 9, 2023

No, I did not use preserve_formatting. I tried to add this parameter, but it did not change anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants