Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make translation string extraction work with rust #10345

Merged
merged 2 commits into from Mar 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
69 changes: 37 additions & 32 deletions build_tools/fish_xgettext.fish
@@ -1,18 +1,40 @@
#!/usr/bin/env fish
#
# Tool to generate messages.pot
# Extended to replace the old Makefile rule which did not port easily to CMake

# This script was originally motivated to work around a quirk (or bug depending on your viewpoint)
# of the xgettext command. See https://lists.gnu.org/archive/html/bug-gettext/2014-11/msg00006.html.
# However, it turns out that even if that quirk did not exist we would still need something like
# this script to properly extract descriptions. That's because we need to normalize the strings to
# a format that xgettext will handle correctly. Also, `xgettext -LShell` doesn't correctly extract
# all the strings we want translated. So we extract and normalize all such strings into a format
# that `xgettext` can handle.
# Create temporary directory for these operations. OS X `mktemp` is somewhat restricted, so this block
# works around that - based on share/functions/funced.fish.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, sounds like a good direction.

We also assume any string constant could be translated.

If this has a significant number of false positives, we should fix it.
I'm surprised this is necessary, doesn't cargo expand give us exactly the
set of strings that are passed to wgettext!. (Well there's one or two which
it can't show because they come from a local variable (that could be one of
two constant strings) but we can fix that manually).

Copy link
Member Author

@faho faho Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the sort of output you get:

&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "%ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = WILDCARD_ERR_MSG;
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Invalid redirection target: %ls";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = "Requested redirection to '%ls', which is not a valid file descriptor";
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ERROR_TIME_BACKGROUND;
--
&crate::wutil::gettext::wgettext_static_str({
    const _WIDESTRING_U32_MACRO_UTF8: &::widestring::internals::core::primitive::str = ILLEGAL_FD_ERR_MSG;

Those ones like "ILLEGAL_FD_ERR_MSG" or "ERROR_TIME_BACKGROUND" are constants, defined elsewhere (parse_constants, for all I've found so far)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that "%ls" comes from report_error!, so it's a macro inside a macro that ends up passing a message that I think is already translated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok well that's almost always just one indirection, we can probably expand it somehow, perhaps even grep.
Of course that's only worth it if someone uses the translations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to expand it if we just assume that all string constants are translatable - which turns out is basically true, and where it isn't at worst we have a message that isn't actually used.

So I would just leave it as what it is now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future we should not show to translators strings that are not actually translated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be like three strings. It's fine for now.

I'm not going to spend more time on this solution, it'll get us up to "you can translate this again now".

set -q TMPDIR
or set -l TMPDIR /tmp
set -l tmpdir (mktemp -d $TMPDIR/fish.XXXXXX)
or exit 1

# This is a gigantic crime.
# xgettext still does not support rust *at all*, so we use cargo-expand to get all our wgettext invocations.
set -l expanded (cargo expand --lib; for f in fish{,_indent,_key_reader}; cargo expand --bin $f; end)

# Extract any gettext call
set -l strs (printf '%s\n' $expanded | grep -A1 wgettext_static_str |
grep 'widestring::internals::core::primitive::str =' |
string match -rg '"(.*)"' | string match -rv '^%ls$|^$' |
# escaping difference between gettext and cargo-expand: single-quotes
string replace -a "\'" "'" | sort -u)

# Start with the C++ source
xgettext -k -k_ -kN_ -LC++ --no-wrap -o messages.pot src/*.cpp src/*.h
# Extract any constants
set -a strs (string match -rv 'BUILD_VERSION:|PACKAGE_NAME' -- $expanded |
string match -rg 'const [A-Z_]*: &str = "(.*)"' | string replace -a "\'" "'")

# We construct messages.pot ourselves instead of forcing this into msgmerge or whatever.
# The escaping so far works out okay.
for str in $strs
# grep -P needed for string escape to be compatible (PCRE-style),
# -H gives the filename, -n the line number.
# If you want to run this on non-GNU grep: Don't.
echo "#:" (grep -PHn -r -- (string escape --style=regex -- $str) src/ |
head -n1 | string replace -r ':\s.*' '')
echo "msgid \"$str\""
echo 'msgstr ""'
end >messages.pot

# This regex handles descriptions for `complete` and `function` statements. These messages are not
# particularly important to translate. Hence the "implicit" label.
Expand All @@ -22,39 +44,22 @@ set -l implicit_regex '(?:^| +)(?:complete|function).*? (?:-d|--description) (([
# than messages which should be implicitly translated.
set -l explicit_regex '.*\( *_ (([\'"]).+?(?<!\\\\)\\2) *\).*'

# Create temporary directory for these operations. OS X `mktemp` is somewhat restricted, so this block
# works around that - based on share/functions/funced.fish.
set -q TMPDIR
or set -l TMPDIR /tmp
set -l tmpdir (mktemp -d $TMPDIR/fish.XXXXXX)
or exit 1

mkdir -p $tmpdir/implicit/share/completions $tmpdir/implicit/share/functions
mkdir -p $tmpdir/explicit/share/completions $tmpdir/explicit/share/functions

for f in share/config.fish share/completions/*.fish share/functions/*.fish
# Extract explicit attempts to translate a message. That is, those that are of the form
# `(_ "message")`.
string replace --filter --regex $explicit_regex 'echo $1' <$f | fish >$tmpdir/explicit/$f.tmp 2>/dev/null
while read description
echo 'N_ "'(string replace --all '"' '\\"' -- $description)'"'
end <$tmpdir/explicit/$f.tmp >$tmpdir/explicit/$f
rm $tmpdir/explicit/$f.tmp
string replace --filter --regex $explicit_regex '$1' <$f | string unescape \
| string replace --all '"' '\\"' | string replace -r '(.*)' 'N_ "$1"' >$tmpdir/explicit/$f

# Handle `complete` / `function` description messages. The `| fish` is subtle. It basically
# avoids the need to use `source` with a command substitution that could affect the current
# shell.
string replace --filter --regex $implicit_regex 'echo $1' <$f | fish >$tmpdir/implicit/$f.tmp 2>/dev/null
while read description
# We don't use `string escape` as shown in the next comment because it produces output that
# is not parsed correctly by xgettext. Instead just escape double-quotes and quote the
# resulting string.
echo 'N_ "'(string replace --all '"' '\\"' -- $description)'"'
end <$tmpdir/implicit/$f.tmp >$tmpdir/implicit/$f
rm $tmpdir/implicit/$f.tmp
string replace --filter --regex $implicit_regex '$1' <$f | string unescape \
| string replace --all '"' '\\"' | string replace -r '(.*)' 'N_ "$1"' >$tmpdir/implicit/$f
end

xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap -o messages.pot $tmpdir/explicit/share/*/*.fish
xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap -o messages.pot $tmpdir/implicit/share/*/*.fish
xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap -o messages.pot $tmpdir/{ex,im}plicit/share/*/*.fish

rm -r $tmpdir
2 changes: 1 addition & 1 deletion src/builtins/abbr.rs
Expand Up @@ -324,7 +324,7 @@ fn abbr_add(opts: &Options, streams: &mut IoStreams) -> Option<c_int> {
.append(wgettext_fmt!("%ls: %ls\n", CMD, regex_pattern.as_utfstr()));
streams
.err
.append(wgettext_fmt!("%ls: %*ls\n", CMD, offset, "^"));
.append(sprintf!("%ls: %*ls\n", CMD, offset, "^"));
}
return STATUS_INVALID_ARGS;
}
Expand Down
2 changes: 1 addition & 1 deletion src/parser.rs
Expand Up @@ -962,7 +962,7 @@ impl Parser {
&wgettext_fmt!(
"Could not write profiling information to file '%s': %s",
&String::from_utf8_lossy(path),
format!("{}", err)
err.to_string()
)
);
return;
Expand Down