Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make translation string extraction work with rust #10345

Merged
merged 2 commits into from Mar 9, 2024

Conversation

faho
Copy link
Member

@faho faho commented Mar 5, 2024

Description

This extracts the strings we pass to wgettext! and friends into the po files.

The usual way to do this is xgettext. Unfortunately, while there is a patch to add rust support, it appears to have been stuck in limbo for almost 2 years.

Since we already have a script to work around xgettext issues, let's just create some issues of our own!

This uses cargo-expand to extract the strings from uses of wgettext.

For obvious reasons, that's a bit brittle, so I'm not about to call this the absolute greatest solution ever.

However, cargo-expand means we get nested macros etc for free - anything that expands to "pass this string literal to gettext" will work. Additionally, we assume any string constant (except for BUILD_VERSION and PACKAGE_NAME (why do we even have PACKAGE_NAME)) could be translated. Worst case it's useless guff in the po file.

I have not included the generated po-files in this because those are about ~250k lines changed.

Here are the statistics on what would change in them in terms of "translated" (has msgid and msgstr) vs untranslated strings:

OLD po/de.po 1458 translated messages, 1 fuzzy translation, 21680 untranslated messages.
NEW po/de.po 1439 translated messages, 23474 untranslated messages.

OLD po/en.po 2335 translated messages, 20804 untranslated messages.
NEW po/en.po 2330 translated messages, 22583 untranslated messages.

OLD po/fr.po 7469 translated messages, 15670 untranslated messages.
NEW po/fr.po 7334 translated messages, 17579 untranslated messages.

OLD po/pl.po 86 translated messages, 23053 untranslated messages.
NEW po/pl.po 103 translated messages, 24810 untranslated messages.

OLD po/pt_BR.po 1146 translated messages, 21993 untranslated messages.
NEW po/pt_BR.po 1152 translated messages, 23761 untranslated messages.

OLD po/sv.po 1590 translated messages, 21549 untranslated messages.
NEW po/sv.po 1586 translated messages, 23327 untranslated messages.

OLD po/zh_CN.po 35 translated messages, 23104 untranslated messages.
NEW po/zh_CN.po 73 translated messages, 24840 untranslated messages.

Note that a lot of these are bound to be unused strings, we have not kept up well with translations.

The increase in translated messages is due to unused translations being matched to msgids - I believe in some cases the old code did not correctly figure out the location. I have compared some of the zh_CN ones and they seem plausible (in terms of format specifiers and google translate).


To be quite honest, I'm pretty sure the future direction would take us off gettext.

The tools aren't good, development isn't quick, and it's tying us to the C-world.

But this seems like a necessary step in order to get the strings out.

As an important note: Only 434 of the 25000 messages are in the core src/. That means french is at about 40%, german is at about 30%. These numbers aren't good, but it is possible to improve them.

It is much too easy to feel like this work is useless because of the staggering number of messages we have. But I do believe that one dedicated translator for a language might be capable of translating 100% of src/ - but they would have to keep at it, because we keep changing and adding and removing these messages.

faho added 2 commits March 5, 2024 18:02
This is absolutely disgusting code, but it works out okay-ish.

The problem is xgettext has no rust support (it's stuck in review
limbo). So we use cargo-expand to extract all invocations of
gettext, and massage all that to generate a
messages.pot ourselves.

We also assume any string constant could be translated.
@faho faho added the rust label Mar 5, 2024
@faho faho added this to the fish next-3.x milestone Mar 5, 2024
# all the strings we want translated. So we extract and normalize all such strings into a format
# that `xgettext` can handle.
# Create temporary directory for these operations. OS X `mktemp` is somewhat restricted, so this block
# works around that - based on share/functions/funced.fish.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, sounds like a good direction.

We also assume any string constant could be translated.

If this has a significant number of false positives, we should fix it.
I'm surprised this is necessary, doesn't cargo expand give us exactly the
set of strings that are passed to wgettext!. (Well there's one or two which
it can't show because they come from a local variable (that could be one of
two constant strings) but we can fix that manually).

Copy link
Member Author

@faho faho Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the sort of output you get:

&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "%ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = WILDCARD_ERR_MSG;
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection target: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Requested redirection to '%ls', which is not a valid file descriptor";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ERROR_TIME_BACKGROUND;
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ILLEGAL_FD_ERR_MSG;

Those ones like "ILLEGAL_FD_ERR_MSG" or "ERROR_TIME_BACKGROUND" are constants, defined elsewhere (parse_constants, for all I've found so far)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that "%ls" comes from report_error!, so it's a macro inside a macro that ends up passing a message that I think is already translated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok well that's almost always just one indirection, we can probably expand it somehow, perhaps even grep.
Of course that's only worth it if someone uses the translations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to expand it if we just assume that all string constants are translatable - which turns out is basically true, and where it isn't at worst we have a message that isn't actually used.

So I would just leave it as what it is now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future we should not show to translators strings that are not actually translated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be like three strings. It's fine for now.

I'm not going to spend more time on this solution, it'll get us up to "you can translate this again now".

@zanchey
Copy link
Member

zanchey commented Mar 9, 2024

This approach looks sensible to me. The one advantage of gettext is how ubiquitous it is, but if there's alternatives that will make working with translations easier I'm all in favour.

@faho faho merged commit d91ad29 into fish-shell:master Mar 9, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants