Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate homonyms in glossaries / provide alternative translations? #97

Open
sarkipo opened this issue Feb 23, 2024 · 3 comments
Open

Comments

@sarkipo
Copy link

sarkipo commented Feb 23, 2024

Hello,

Is there a way to specify different translations for homonymous words in a glossary?

E.g. in a Russian > English translation, there is a verb 'пропасть' to which I would like to assign the translation "disappear".
But 'пропасть' can also be a noun meaning "abyss", and I don't want it to be bluntly replaced by "disappear" everywhere.

Is there a way to disambiguate two words like that in the input?
If not, it might help to be able to specify an alternative translation ranked lower than the first one, but I'm not sure how exactly that should work.

@JanEbbing
Copy link
Member

JanEbbing commented Feb 26, 2024

Hi, thanks for your question! This is a bit tricky.

  1. In principle, your glossary must not have 2 entries with the exact same string in the source language. See the docs for this part

  2. However, our glossary feature is not a simple string search & replace, so in principle if you add a glossary term 'пропасть => disappear', it won't replace all occurences of 'пропасть' with 'disappear'. In the example you give, it unfortunately does not work properly however.

An example where it works (sorry I don't speak Russian) with English to German:

"rain" in english can be the rain (German: Regen) or to rain (German: regnen).

I can add a term 'rain => schütten' (only makes sense for the verb form) to my glossary, and it translates:

"It was raining a lot that day. The rain really did not seem to stop."

into

"Es schüttete an diesem Tag sehr viel. Der Regen schien wirklich nicht aufzuhören."

However, doing the same in your example fails. This will depend on the specific languages and its associated grammar, for example you can add infinitive markers ("to rain") or articles ("the rain") in your glossary definition, if they exist in the associated language.

Glossary term: 'пропасть => to disappear'

"Загляните в пропасть. Пропасть разверзлась подо мной."

"Look into the disappear. The disappearance opened up beneath me."

I flagged this to the team responsible for glossaries and we might fix this in the future.

@sarkipo
Copy link
Author

sarkipo commented Feb 26, 2024

Hi Jan @JanEbbing, thanks a lot for your answer and for signaling the problem.
That's all good to know. Adding "to" is quite fine with me, but I just fear there will be cases where this particular trick won't work. Is there a more general way to specify the relevant part of speech, apart from the infinitive particle? (Something like adding a POS tag like "n", "v", "adj" etc.)

Another problem are the homonyms of the same part of speech, e.g. plane "airplane" vs. plane as a term in geometry.

@JanEbbing
Copy link
Member

I don't think we would want to add a POS tag as it doesn't fully solve the problem as you remarked (I would even guess that the majority of homonyms are between words of the same POS). I will check with our glossary team if adding support to distinguish between different word meanings in glossaries is on the roadmap, but can't give an estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants