Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to skip translation of some words? #78

Open
Pablohn26 opened this issue Mar 5, 2023 · 2 comments
Open

How to skip translation of some words? #78

Pablohn26 opened this issue Mar 5, 2023 · 2 comments

Comments

@Pablohn26
Copy link

Hi,
I would like to use this model to translate XML content (to translate Android Apps). The problem is that it is also translating some code words of the content that I do not want to translate, and adding some spaces that would break the content. How could I skip translation xml code strings?

imagen

For example ChatGPT is respecting that:

imagen

Another option would be training this model only for Android xml language files. Could you point me to a guide to do so?

Thanks for sharing this amazing software.

@jorgtied
Copy link
Member

jorgtied commented Mar 6, 2023

There is no immediate fix for this as the models are trained to use plain text as input and they haven't seen tagged data. One could do some kind of pre- and post-processing to keep tags in place or some clever fine-tuning as you point out.

Do I understand correctly that you basically only want to translate the text between the XML tags (raw text but not the XML stuff?). You could send only those to the model and insert the translations into the XML template. Would that work for you?

@bukosabino
Copy link

Hi @Pablohn26 ,

You could use a library such as https://lxml.de to get the text, then send it to the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants