Different output to_latin for v1.0 and v1.1 #37

matteomedioli · 2023-02-07T15:13:05Z

I received different outputs for the same inputs based on code version:

Version: 1.0

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Я часто пью водку", "ru")
"JA chasto p'ju vodku"

Version: 1.1

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Я часто пью водку", "ru")
"YA chasto p'yu vodku"

The text was updated successfully, but these errors were encountered:

georgeslabreche · 2023-02-07T21:12:35Z

Thank you for flagging this! Which one is the correct one? cc @ratijas and @rominf.

ratijas · 2023-02-07T22:55:43Z

Hi

First of all, it took me quite some time to realize what did I do to this project in past that I'm being tagged here. But that's fine :)

I did a quick refresh on romanization of languages. According to a comparison table on Wikipedia, it seems there are multiple options to choose from, either is correct according to one standard or another.

https://en.wikipedia.org/wiki/Romanization_of_Russian#Transliteration_table

To make things completely absurd, there's even "я" -> "ia" variant for passports!

So, I think, the question is malformed, and can not be answered without specifying particular standard as well 🙃

georgeslabreche · 2023-02-07T23:09:41Z

Thank you @ratijas for your input! Sorry for the unsolicited tagging😬.

rominf · 2023-02-08T18:32:53Z

Hi, @georgeslabreche.

I agree with what @ratijas said: standards matter. My variant is based on the standard of Russian government (and it's mentioned in the PR), while first version of Russian transliteration is apparently not based on any standard (at least, I'm not aware of it and the PR doesn't mention it). Speaking of this particular example: "я" -> "ja" or "ya", the latter one is more popular in Russia, take https://en.wikipedia.org/wiki/Yandex for example.

It was me who broke the compatibility, but I believe I did the right thing (standards matter more than compatibility). It was clear from tests that the compatibility is broken, but probably I should have emphasize this in the PR/commit messages and recommend you to create version 2.0 to follow semantic versioning. Sorry about that.

If you want to provide full coverage (hard, especially Latin -> Cyrillic), you probably want to add scheme argument as it's done here: https://github.com/nalgeon/iuliia-py.

georgeslabreche · 2023-02-08T18:47:31Z

I agree with both of you, thank you for your inputs and clarification. @rominf: no need to apologize, it's an excellent contribution.

I like the schema approach. However, I'm not sure if I'll go ahead with implementing something similar in the near feature since iuliia already offers that elegant alternative for Russian transliteration.

rominf · 2023-02-08T19:04:41Z

@georgeslabreche Just in case: iuliia offers only Cyrillic -> Latin, but not the other way around (which was important to me because I've tried to create TTS for mixed text in Cyrillic and Latin based on Russian voice), this is why I search for other libraries, found your library, wasn't fully satisfied by it, and did my contribution. So, your library is not fully comparable to iuliia, they have different use cases.

georgeslabreche mentioned this issue Aug 16, 2023

H is not getting transliterated to russian #41

Open

georgeslabreche added the enhancement label Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different output to_latin for v1.0 and v1.1 #37

Different output to_latin for v1.0 and v1.1 #37

matteomedioli commented Feb 7, 2023

georgeslabreche commented Feb 7, 2023

ratijas commented Feb 7, 2023

georgeslabreche commented Feb 7, 2023

rominf commented Feb 8, 2023

georgeslabreche commented Feb 8, 2023 •

edited

rominf commented Feb 8, 2023

Different output to_latin for v1.0 and v1.1 #37

Different output to_latin for v1.0 and v1.1 #37

Comments

matteomedioli commented Feb 7, 2023

georgeslabreche commented Feb 7, 2023

ratijas commented Feb 7, 2023

georgeslabreche commented Feb 7, 2023

rominf commented Feb 8, 2023

georgeslabreche commented Feb 8, 2023 • edited

rominf commented Feb 8, 2023

georgeslabreche commented Feb 8, 2023 •

edited