Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in the placement of tags #77

Open
awerks opened this issue Aug 13, 2023 · 1 comment
Open

Mismatch in the placement of tags #77

awerks opened this issue Aug 13, 2023 · 1 comment

Comments

@awerks
Copy link

awerks commented Aug 13, 2023

The problem seems to be a mismatch in the placement of the tags number between the English and Ukrainian sentences.

In the English text, after the segment "Is she into you?", there's the tag 5, followed by "Eye contact is obviously a good sign, but" with the tag 6.

In the Ukrainian text, after the segment "Ви їй подобаєтесь?", there's the tag 5. However, following that, there's the tag 6 directly without any text in between, and then the corresponding translation for "Eye contact is obviously a good sign, but" follows.

This inconsistency in tag placement means that the two texts may not align properly when being processed.


"I know a lot of guys have trouble picking up on the signs of whether or not a girl is into them, so today I wanted to do a little quiz so<x>1</x>you can see if she actually does like you, or even just how good you are at recognizing signals that you might be getting.<x>2</x>Let's say you're in a bar and you look across the length of the counter and you see a cute girl who<x>3</x>glances at you briefly, does a quick hair flip, and then turns back to her friends.<x>4</x>Is she into you?<x>5</x>Eye contact is obviously a good sign, but<x>6</x>you really can't tell from this example."

"Я знаю, що багатьом хлопцям важко зрозуміти, чи подобається вони дівчині, тому сьогодні я хотів би провести невеликий тест, щоб<x>1</x>ви дізналися, чи справді ви їй подобаєтеся, або навіть наскільки добре ви розпізнаєте сигнали, які можете отримувати.<x>2</x>Уявімо, що ви в барі, дивитеся через всю стійку і бачите симпатичну дівчину, яка<x>3</x>на вас, швидко поправляє зачіску, а потім повертається до своїх подруг.<x>4</x><x>6</x>Ви їй подобаєтесь?<x>5</x>Зоровий контакт - це, звісно, хороший знак, але з цього прикладу ви не можете цього сказати."

tag_handling is xml, ignore_tags is 'x'.

preserve_formatting and split_sentences have no effect on the problem.

Is there another way to keep context within sentences split by conjunctions and commas?

@JanEbbing
Copy link
Member

Hi, sorry for this issue. Could you please share the code you used to get this translation?

Using python 3.10.12 and deepl 1.15.0, and the following code:

import deepl
import os

t = deepl.Translator(os.environ["DEEPL_AUTH_KEY"])
text = "I know a lot of guys have trouble picking up on the signs of whether or not a girl is into them, so today I wanted to do a little quiz so<x>1</x>you can see if she actually does like you, or even just how good you are at recognizing signals that you might be getting.<x>2</x>Let's say you're in a bar and you look across the length of the counter and you see a cute girl who<x>3</x>glances at you briefly, does a quick hair flip, and then turns back to her friends.<x>4</x>Is she into you?<x>5</x>Eye contact is obviously a good sign, but<x>6</x>you really can't tell from this example."
r = t.translate_text(text, target_lang="UK", tag_handling="xml", ignore_tags="x")
print(r.text)

prints

Я знаю, що багатьом хлопцям важко зрозуміти, чи подобається вони дівчині, тому сьогодні я хочу провести невеличкий тест, щоб<x>1</x>ви дізналися, чи справді ви їй подобаєтеся, або навіть наскільки добре ви вмієте розпізнавати сигнали, які можете отримувати.<x>2</x>Уявімо, що ви перебуваєте в барі, дивитеся вздовж стійки і бачите симпатичну дівчину, яка<x>3</x>на вас короткий погляд, робить швидкий помах волоссям, а потім повертається до своїх друзів.<x>4</x>Ви їй подобаєтеся?<x>5</x>Зоровий контакт - це, очевидно, хороший знак, але<x>6</x>з цього прикладу ви насправді не можете сказати, що це так.

Which is slightly different from your output and does not seem to have the issue you mention?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants