Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(i18n): improved Russian translation #28572

Merged
merged 8 commits into from
Jun 5, 2024
Merged

Conversation

goldjee
Copy link
Contributor

@goldjee goldjee commented May 18, 2024

SUMMARY

Recreated my previous PR #28371 because some PRs for other languages were approved and I wasn't able to resolve conflicts by myself. This new version only touches russian lang files and common messages.pot.

This is a pretty big change to Russian translation.

  • Extracted new strings and updated language files, as described in the documentation.
  • Normalized usage of dots at the end of sentences. The rule is simple: if given label contains only one sentence, it should not be ended with a dot. On the contrary, labels comprised of several sentences should be ended with a dot.
  • Normalized usage of "е" and "ё" symbols. As there was no "ё" pretty much everywhere and people are used to omit dots above this symbol, I have changed "ё" to "е" everywhere this symbol occurs.
  • Fixed typos and written language style issues.
  • Renamed chart to "Диаграмма" (previously it was called "График"). The reason is that "график" is a word that means "line plot". As most of visuals types are not plots, it is much more correct to call them "диаграмма" (diagram, chart) instead.
  • Normalized naming of various chart types.
  • Used English names for things where it is appropriate. For instance, "Superset" should not be translated to "Суперсет", as it is a proper noun.
  • Added quite a bunch of missing strings, corrected and confirmed some drafts.
  • Shortened some strings for the sake of brevity and to make them fit into tight spaces.

Of course, there is still a lot of work to do, but I have managed to cover the most frequently viewed UI elements and pages. Hope the team and end users will find this version alright.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added i18n Namespace | Anything related to localization i18n:russian Translation related to Russian language labels May 18, 2024
Copy link

codecov bot commented May 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.61%. Comparing base (76d897e) to head (c442024).
Report is 272 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28572       +/-   ##
===========================================
+ Coverage   60.48%   83.61%   +23.12%     
===========================================
  Files        1931      519     -1412     
  Lines       76236    37458    -38778     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31321    -14793     
+ Misses      28017     6137    -21880     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.94% <ø> (-0.23%) ⬇️
javascript ?
mysql 77.15% <ø> (?)
postgres 77.29% <ø> (?)
presto 53.53% <ø> (-0.27%) ⬇️
python 83.61% <ø> (+20.12%) ⬆️
sqlite 76.73% <ø> (?)
unit 59.01% <ø> (+1.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@goldjee
Copy link
Contributor Author

goldjee commented May 19, 2024

@villebro @rusackas Could you please check why Python-Integration / test-mysql (pull_request) test has failed? I haven't altered Python part of code, but maybe I have still messed up somehow.

@goldjee
Copy link
Contributor Author

goldjee commented May 20, 2024

@villebro @rusackas Could you please check why Python-Integration / test-mysql (pull_request) test has failed? I haven't altered Python part of code, but maybe I have still messed up somehow.

Nevermind, probably a glitch. Now it passes. Can we move forward with this PR?

@rusackas
Copy link
Member

Maybe @artemonsh can take a look at proofreading these edits?

Copy link
Contributor

@artemonsh artemonsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases I've seen scrolling this PR where the translation is empty.

@goldjee

Some thoughts:
Superset BI is not used widely yet in Russia and most users use tools like Power BI, QlikView and Tableau. Despite this, because of sanctions a lot of analysts will move to Superset this or next year. These tools were the first thing I was looking at when making translations. In Power BI and Tableau we use "График", in QlikView we use "Диаграмма". Considering Power BI as one of the most popular tool in the country I prefer "График".

Moreover, "Chart" and "График" both are short words, while "Диаграмма" is a quite long word, that can break UI in buttons everywhere in Superset. I think it is the best choice to translate while keeping things like this in mind.

@rusackas
There are too many translations in this PR :-( I am sorry, but I am unable to review all the changes

superset/translations/ru/LC_MESSAGES/messages.json Outdated Show resolved Hide resolved
@goldjee
Copy link
Contributor Author

goldjee commented May 21, 2024

@artemonsh, thank you for the feedback!

Right, there are still quite a bunch of empty strings. Translating, I was leaving particular string as is if:

  • I haven't encountered it in the UI and wasn't sure about the context.
  • The string could be a common term that shouldn't be translated.
  • It looked like some low-level stuff that, when translated, could confuse end user. For instance, I have intentionally left lines 81-84 that you have mentioned. I haven't stumbled upon them in the UI and, given that they are uppercase, I thought that I could break something if I provide a translation. That was the case when someone among previous contributors has translated "DELETE" string (or some other one like that) and was asked to leave it as is.
  • I wasn't sure how to properly translate a string because I am not familiar with related functionality and didn't want to mess up.

Sure there is a lot more work to do. At this point my goal was to provide at least something I was confident with to improve user experience, not the full localization. As the next step, I was planning to check specific chart settings, fill the gaps there and straighten up the terminology used. But I think it's better to deliver that in future PRs.

Regarding what word to use as "chart" term. I totally agree that's one of the core terms and it should be picked very carefully. For reference, Ozhegov's dictionary gives this definition to the term:

ГРАФИК - диаграмма, изображающая при помощи кривых количественные показатели движения, состояния чего-н.
ГРАФИК is a diagram that depicts quantitative indicators of movement, state of something with curves.

Diagram is described as follows:

ДИАГРАММА - Графическое изображение соотношения каких-н. величин.
ДИАГРАММА is a graphical representation of the relationship of some quantities.

Come on, "Диаграмма" is just 3 characters longer than "График", but that's much more suitable :)

Yeah, Power BI is very popular, it's my daily workhorse as well. Just checked the naming in v2.126.927.0 (February 2024), and they used "Визуализации" ("Visualizations") for visual elements of dashboards, which is even longer.

PBI

Of course, trying to incorporate new naming into Superset, I have checked that the word fits the UI space. Looked quite nice and I haven't seen any elements that were broken on my 13'' laptop screen. Let me show you some examples.

"Charts" screen
Charts

"Create chart" dialog
Create chart

Chart editor
Chart editor

Picking a chart in Dashboard editor
Dashboard editor

If there are some other spots worth checking, please tell me.

@goldjee
Copy link
Contributor Author

goldjee commented May 21, 2024

Also, before moving forward, please let me adjust related commits as I've just noticed I haven't set my local Git username/email properly :/

@goldjee goldjee marked this pull request as draft May 21, 2024 09:41
@goldjee goldjee marked this pull request as ready for review May 21, 2024 10:30
@goldjee
Copy link
Contributor Author

goldjee commented May 21, 2024

Fixed it. Sorry for the mess. I'm new to collaborative development and still figuring out how everything works.

@rusackas
Copy link
Member

@mistercrunch @villebro I'm a little stumped on how to properly review this. The changes to the .pot file are huge, and when I run the extraction command (pybabel extract -F superset/translations/babel.cfg -o superset/translations/messages.pot -k _ -k __ -k t -k tn -k tct .) locally on this branch, the .pot file is again quite different from what's on this PR. This all makes me nervous 😅

@goldjee
Copy link
Contributor Author

goldjee commented Jun 1, 2024

@rusackas I think that's because that command doesn't order extracted strings. Before doing anything with the .po file in this branch, I have run ./scripts/babel_update.sh as was mentioned in the documentation. So even though a lot of the strings weren't actually modified by me, the lines were moved automatically and formed a huge diff.

I have noticed that every string has a reference to the file that uses it. If the translations could be automatically arranged by these references, even alphabetically, that would result in more deterministic output of the extraction and help to identify the exact diffs. As an additional effect, it would enable simultaneous improvements of the same .po file by several contributors. As of now, any other merged PR on russian translation would end up as an integration hell for me.

@mistercrunch
Copy link
Member

Pointing out two things, one is I move the file under scripts/translations/babel_update.sh.

The other is note the --sort-output in the command:
https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh#L45

The docs had 2-3 ways of calling the pybabel commands, and i think my recent PR removed that in favor of pointing the docs to the scripts.

@goldjee
Copy link
Contributor Author

goldjee commented Jun 4, 2024

@mistercrunch The first commit in this branch was based on previous workflow of updating i18n files, as described here (BTW these docs are still behind your PR #28483 for some reason). Then I've been working with that version and didn't re-run babel_update.sh even after rebasing to later state of upstream. Shall I extract strings again?

@rusackas
Copy link
Member

rusackas commented Jun 4, 2024

Definitely hoping to avoid these conflict nightmares resulting from poorly sorted (and seldom updated) babel extracts.

If you can update this PR to run the script as @mistercrunch suggests (with --sort-output) then at least I should be able to run it when I pull the branch and not get a mess. This seems like it would make it approvable.

Other follow-up items might be:
• Make sure the --sort-output is highly discoverable or called by another command (this sounds like it's the case now, but might warrant a quick review
• Add the extraction script as a CI job, or even the pre-commit hooks, making sure that the .pot remains fresh as people add/remove/update translation calls throughout the application.

@github-actions github-actions bot added i18n:spanish Translation related to Spanish language i18n:italian Translation related to Italian language i18n:french Translation related to French language i18n:chinese Translation related to Chinese language i18n:japanese Translation related to Japanese language i18n:korean Translation related to Korean language i18n:dutch i18n:slovak i18n:ukrainian i18n:portuguese i18n:brazilian i18n:traditional-chinese labels Jun 4, 2024
@goldjee
Copy link
Contributor Author

goldjee commented Jun 4, 2024

I've run

./scripts/translations/babel_update.sh --sort-output
 pybabel update -i superset/translations/messages.pot -d superset/translations --ignore-obsolete

All i18n files were affected. Now the Python-Integration check is failing again, idk why. Could you please take a look?

@rusackas
Copy link
Member

rusackas commented Jun 4, 2024

Weird... there was a test where an endpoint was returning a 200 instead of a 202. This seems like an error we saw in those tests a while back that was resolved, but I don't think this PR is as old as that. I've run the CI job again to see if it was just flakey and magically passes. If not, the PR might need a rebase, which I hope isn't too painful. Let's wait and see...

@mistercrunch
Copy link
Member

Just a note, but from my understanding, running this

./scripts/translations/babel_update.sh --sort-output
 pybabel update -i superset/translations/messages.pot -d superset/translations --ignore-obsolete

as you mentioned @goldjee , is the same as just running

./scripts/translations/babel_update.sh

as the bash script doesn't receive any params.

@mistercrunch
Copy link
Member

Weird... there was a test where an endpoint was returning a 200 instead of a 202

That's a known flaky test, happens on mysql only for some reason maybe one out of 20 runs or so. I'll send Superset swag to whoever fixes it!

@mistercrunch
Copy link
Member

[I think] It should always be safe to run ./scripts/translations/babel_update.sh and merge what comes out. So I'm fine with this PR as is. As we do this, we should expect the .po files to change a lot as if you were to only add a single line at the top some files, it'll change all of the "which line of code" references.

As much as it's useful to have the references, it's a pain from a source control standpoint as the moment anyone adds a line of code, the .po files change a lot, which can lead to merge conflicts and such.

@goldjee , in terms of handling merge conflicts, I think it should always be safe to:

  • before the imminent merge conflict, set aside the .po file(s) you've altered/improved in a temp folder someplace
  • rebase/merge the upstream changes
  • copy your .poback into their place in the repo
  • re-run ./scripts/translations/babel_update.sh

I think we may want to capture this ^^^ into the docs.

Now another thing we may want to do would be to tell people submitting translation PRs to NOT run ./scripts/translations/babel_update.sh as part of their PR, so that they're actually reviewable... In this PR it seems impossible to review the changes you've made because GitHub is overwhelmed by number of changes. Even if it were to expose them, it would still be hard to review as it's a mix of legit changes + .po files bookkeeping.

I wonder if we could set the .po files to NOT hold the references (?)

@mistercrunch mistercrunch merged commit eef7828 into apache:master Jun 5, 2024
34 checks passed
@Atmostone
Copy link

@goldjee Thanks for translations!
But I still struggle with some things I can't translate.
Like this in table chart:
image
And this in pie chart:
image
Do you have any ideas how I can translate it?

@goldjee
Copy link
Contributor Author

goldjee commented Jun 6, 2024

@Atmostone My pleasure.

Do you mean that you weren't able to find corresponding lines in messages.po file? I have stumbled upon several cases like that as well. For example, at "Create chart" page there is "or" that I wasn't able to translate.

Снимок экрана 2024-06-06 в 16 54 32

I don't know how to deal with such cases yet. Probably the code of the corresponding components don't take advantage of i18n system and should be modified in prior.

Also, @mistercrunch @rusackas, will it be appropriate if I create a discussion to gather feedback like this on russian translation? I would like to tinker with it a bit more, as there is a lot of room for improvements.

@Atmostone
Copy link

@goldjee
image
For this case I found some strings that can be related, then added translation, but it didn't work
image
Then I found this file that has only FR and ZH translations, maybe I need to modify this file (but we use superset on docker that has only compiled frontend, so I'm gonna try to install it manually and hope I can make experiments with frontend)

@mistercrunch
Copy link
Member

For things like that table components and other things, we may need to feed them "translatable strings", assuming they do allow controlling those labels externally. You may have to do a bit of digging into the React component tree, and read the external libraries' documentation to figure how whether and how you can pass these translatable labels to them. If it's not supported, it may be possible to contribute such features, or there may be hacks to modify the dom - though this is not the preferred route. i18n is a worthy cause and component lib managers should welcome these changes if they're not already in there.

@goldjee goldjee deleted the i18n-ru branch June 9, 2024 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n:brazilian i18n:chinese Translation related to Chinese language i18n:dutch i18n:french Translation related to French language i18n:italian Translation related to Italian language i18n:japanese Translation related to Japanese language i18n:korean Translation related to Korean language i18n:portuguese i18n:russian Translation related to Russian language i18n:slovak i18n:spanish Translation related to Spanish language i18n:traditional-chinese i18n:ukrainian i18n Namespace | Anything related to localization size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants