New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make translation string extraction work with rust #10345
Conversation
This is absolutely disgusting code, but it works out okay-ish. The problem is xgettext has no rust support (it's stuck in review limbo). So we use cargo-expand to extract all invocations of gettext, and massage all that to generate a messages.pot ourselves. We also assume any string constant could be translated.
# all the strings we want translated. So we extract and normalize all such strings into a format | ||
# that `xgettext` can handle. | ||
# Create temporary directory for these operations. OS X `mktemp` is somewhat restricted, so this block | ||
# works around that - based on share/functions/funced.fish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds like a good direction.
We also assume any string constant could be translated.
If this has a significant number of false positives, we should fix it.
I'm surprised this is necessary, doesn't cargo expand
give us exactly the
set of strings that are passed to wgettext!
. (Well there's one or two which
it can't show because they come from a local variable (that could be one of
two constant strings) but we can fix that manually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the sort of output you get:
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "%ls";
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = WILDCARD_ERR_MSG;
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection target: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Requested redirection to '%ls', which is not a valid file descriptor";
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ERROR_TIME_BACKGROUND;
--
&crate::wutil::gettext::wgettext_static_str({
const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ILLEGAL_FD_ERR_MSG;
Those ones like "ILLEGAL_FD_ERR_MSG" or "ERROR_TIME_BACKGROUND" are constants, defined elsewhere (parse_constants, for all I've found so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that "%ls"
comes from report_error!, so it's a macro inside a macro that ends up passing a message that I think is already translated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok well that's almost always just one indirection, we can probably expand it somehow, perhaps even grep.
Of course that's only worth it if someone uses the translations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to expand it if we just assume that all string constants are translatable - which turns out is basically true, and where it isn't at worst we have a message that isn't actually used.
So I would just leave it as what it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in future we should not show to translators strings that are not actually translated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be like three strings. It's fine for now.
I'm not going to spend more time on this solution, it'll get us up to "you can translate this again now".
This approach looks sensible to me. The one advantage of gettext is how ubiquitous it is, but if there's alternatives that will make working with translations easier I'm all in favour. |
Description
This extracts the strings we pass to wgettext! and friends into the po files.
The usual way to do this is xgettext. Unfortunately, while there is a patch to add rust support, it appears to have been stuck in limbo for almost 2 years.
Since we already have a script to work around xgettext issues, let's just create some issues of our own!
This uses cargo-expand to extract the strings from uses of wgettext.
For obvious reasons, that's a bit brittle, so I'm not about to call this the absolute greatest solution ever.
However, cargo-expand means we get nested macros etc for free - anything that expands to "pass this string literal to gettext" will work. Additionally, we assume any string constant (except for BUILD_VERSION and PACKAGE_NAME (why do we even have PACKAGE_NAME)) could be translated. Worst case it's useless guff in the po file.
I have not included the generated po-files in this because those are about ~250k lines changed.
Here are the statistics on what would change in them in terms of "translated" (has msgid and msgstr) vs untranslated strings:
Note that a lot of these are bound to be unused strings, we have not kept up well with translations.
The increase in translated messages is due to unused translations being matched to msgids - I believe in some cases the old code did not correctly figure out the location. I have compared some of the zh_CN ones and they seem plausible (in terms of format specifiers and google translate).
To be quite honest, I'm pretty sure the future direction would take us off gettext.
The tools aren't good, development isn't quick, and it's tying us to the C-world.
But this seems like a necessary step in order to get the strings out.
As an important note: Only 434 of the 25000 messages are in the core src/. That means french is at about 40%, german is at about 30%. These numbers aren't good, but it is possible to improve them.
It is much too easy to feel like this work is useless because of the staggering number of messages we have. But I do believe that one dedicated translator for a language might be capable of translating 100% of src/ - but they would have to keep at it, because we keep changing and adding and removing these messages.