Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtag translation Russian to English: # becomes #592

Open
olets opened this issue Feb 23, 2024 · 3 comments
Open

Hashtag translation Russian to English: # becomes #592

olets opened this issue Feb 23, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@olets
Copy link

olets commented Feb 23, 2024

When translating hashtags from Russian to English, the # becomes a

Russian:

#Docker

English translation:

♪Docker

Example: https://lor.sh/@skobkin/111977814820972935 as viewed from Hachyderm

image
@github-actions github-actions bot added the enhancement New feature or request label Feb 23, 2024
@PJ-Finlay
Copy link
Contributor

This is an eccentricity of the Transformer language model we use for translation not an issue with the LibreTranslate code.

This isn't the first time symbols have been generated incorrectly. I'm guessing the root cause is the OpenSubtitles dataset which is used for traning. OpenSubtitles has a lot of translated subtitles for movies and TV shows and I'm guessing a lot of characters too.

I used to filter certain special characters from the dataset before training the models, which might help with this issue, but I removed the filtering a while ago because it was slow.

@LynxPDA
Copy link

LynxPDA commented Apr 22, 2024

@olets Can you tell me which version of the model has a bad translation? With the latest version RU-EN v1.9 from the index I could not reproduce it.

The update was from this thread:
https://community.libretranslate.com/t/new-argos-model-en-ru-and-ru-en/872/20

@olets
Copy link
Author

olets commented May 4, 2024

It's whatever version Hachyderm is running. Will see if I can find out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants