Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vim clone left "dirty" after make. src/po/ru.cp1251.po is modified. #14490

Open
errael opened this issue Apr 10, 2024 · 16 comments
Open

Vim clone left "dirty" after make. src/po/ru.cp1251.po is modified. #14490

errael opened this issue Apr 10, 2024 · 16 comments

Comments

@errael
Copy link
Contributor

errael commented Apr 10, 2024

Steps to reproduce

I occasionally see this, and I think I've seen mention of this on the dev list; it might always be the same file; this morning I happened to catch it just after it happened. I also built yesterday morning. It doesn't happen too often (the revert file I see is from January), I haven't looked at makefiles, I don't know what dependency is triggering.

If anyone knows something specific to check...

After pull, make reconfig, the console shows the make ending as:

...
OLD_PO_FILE_INPUT=yes msgfmt -v -o vi.mo vi.po
1410 translated messages.
OLD_PO_FILE_INPUT=yes msgfmt -v -o zh_CN.UTF-8.mo zh_CN.UTF-8.po
2800 translated messages.
OLD_PO_FILE_INPUT=yes msgfmt -v -o zh_TW.UTF-8.mo zh_TW.UTF-8.po
1410 translated messages.
make[2]: Leaving directory '/ref/vim/src/po'
make[2]: Entering directory '/ref/vim/src/po'
OLD_PO_FILE_INPUT=yes msgfmt -v -o cs.cp1250.mo cs.cp1250.po
1272 translated messages.
...
OLD_PO_FILE_INPUT=yes msgfmt -v -o pl.cp1250.mo pl.cp1250.po
1858 translated messages.
rm -f ru.cp1251.po
iconv -f UTF-8 -t CP1251 ru.po | \
	LANG=C sed -e 's/charset=[uU][tT][fF]-8/charset=CP1251/' \
		-e 's/# Original translations/# Generated from ru.po, DO NOT EDIT/' \
		> ru.cp1251.po
OLD_PO_FILE_INPUT=yes msgfmt -v -o ru.cp1251.mo ru.cp1251.po
2979 translated messages.
...
make[2]: Leaving directory '/ref/vim/src/po'
make[1]: Leaving directory '/ref/vim/src'
err@harmony:/ref/vim/src
$ hg st
M src/po/ru.cp1251.po

The diff seems to be a lot of formatting. Here's an excerpt, as seen in vim.

-"W17: Äëÿ ðàáîòû ñ àðàáñêîé ïèñüìåííîñòüþ ââåäèòå êîìàíäó `:set "
-"encoding=utf-8`"
+"W17: Äëÿ ðàáîòû ñ àðàáñêîé ïèñüìåííîñòüþ ââåäèòå êîìàíäó `:set encoding=utf-8`"

It interesting that the wrap, as shown in vim above, is happening right around column 80. With all these formatting changes it's hard to see if there are any real changes.

Using gvim -d to look at the checked in file and the it looks like the only changes are around the linebreak and that the translation itself has no changes.

vdiff-translation

Expected behaviour

No modified checked-in file.

Version of Vim

9.1.0299

Environment

ubuntu/gtk

Logs and stack traces

No response

@errael errael added the bug label Apr 10, 2024
@RestorerZ
Copy link
Contributor

RestorerZ commented Apr 10, 2024

Hmm, interesting! Do you have this happen sometimes, and always on this file?
I'm going to try building in MS Windows now - it's fine.
I'll check what's wrong with this file. If there is something wrong with it.

Can you please tell me what your gettext is? Version and so on...

@errael
Copy link
Contributor Author

errael commented Apr 10, 2024

Hmm, interesting! Do you have this happen sometimes,

I see it every month or two

and always on this file?

Not sure, maybe.

For reference, take a look at https://groups.google.com/g/vim_dev/c/Gyqie6FQM4s/m/W2yZLzsuAwAJ from December, other people see this once and a while.

I'm going to try building in MS Windows now - it's fine. I'll check what's wrong with this file. If there is something wrong with it.

Can you please tell me what your gettext is? Version and so on...

I built with current source this morning.

@RestorerZ
Copy link
Contributor

Okay, thank you!
I have a rough idea of what it could be related to.
From the dates you gave, it seems to happen when translations are updated periodically.
Okay, let's try to figure it out.

@RestorerZ
Copy link
Contributor

Based on this line OLD_PO_FILE_INPUT=yes msgfmt -v -o vi.mo vi.po, you are using the "gettext" version before 0.22.

@julio-b
Copy link
Contributor

julio-b commented Apr 10, 2024

I have also been bothered by this and i ended up just compiling with --disable-nls.
But here is what is happening and how to trigger this bug reliably.

I will take as an example commit 8fcc129 because it is the more recent one, but it could be any commit that updates the translation files.

To reproduce:

  1. run this one-liner until you see that src/po/ru.po is newer than src/po/ru.cp1251.po
    $ git checkout 8fcc12977~ && ls -la --time-style=full-iso src/po/ru*po && git checkout master && ls -la --time-style=full-iso src/po/ru*po
    Example output on my fourth run:
HEAD is now at c9ec20d94 runtime(doc): Update documentation
-rw-r--r-- 1 ju ju 410K 2024-04-10 22:43:04.013415053 +0300 src/po/ru.cp1251.po
-rw-r--r-- 1 ju ju 539K 2024-04-10 22:43:04.016748378 +0300 src/po/ru.po
Previous HEAD position was c9ec20d94 runtime(doc): Update documentation
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
-rw-r--r-- 1 ju ju 413K 2024-04-10 22:43:04.050081632 +0300 src/po/ru.cp1251.po
-rw-r--r-- 1 ju ju 543K 2024-04-10 22:43:04.053414958 +0300 src/po/ru.po

notice how ru.po timestamp is newer by a few MS

  1. run $ make

Explanation:

  • You pull a commit that updates some translation files e.g src/po/ru.po and src/po/ru.cp1251.po
  • when this commit is applied, the timestamp of ru.po could end up by a few milliseconds newer than ru.cp1251.po. This can happen randomly and it's unpredictable. It's more noticeable when the cpu is under heavy load.
  • now make detects that ru.cp1251.po file is older and triggers the iconv rule here
  • iconv is not really consistent between systems, so the results you get may be different. I think it's behavior depends on LC_ environmental variables and there are different implementations for windows.

This is possible for any translation file that uses iconv, listed in here.

@errael
Copy link
Contributor Author

errael commented Apr 10, 2024

Based on this line OLD_PO_FILE_INPUT=yes msgfmt -v -o vi.mo vi.po, you are using the "gettext" version before 0.22.

I didn't know what you meant by gettext, now I see. And you are correct; I guess a lot of people are still using that version.

$ gettext --version
gettext (GNU gettext-runtime) 0.21

@RestorerZ
Copy link
Contributor

  • iconv is not really consistent between systems, so the results you get may be different.

Yeah, that's what I was thinking. But not "iconv", but on "msgmerge" and text formatting.

  • the timestamp of ru.po could end up by a few milliseconds

In my opinion, this is too small a difference to handle the rule.

Anyway, according to the documentation, for "nmake.exe " the accuracy is a 2‐second interval.

Thank you for your explanations.

@errael
Copy link
Contributor Author

errael commented Apr 10, 2024

BTW, I use popOS, I update regularly. So I guess their distribution hasn't incorporated the 0.22 version.

@RestorerZ
Copy link
Contributor

I switched to version 0.22 and use it when preparing translations. Anyway, I'll try to figure out what's going on.

@benknoble
Copy link
Contributor

  1. run this one-liner until you see that src/po/ru.po is newer than src/po/ru.cp1251.po
    $ git checkout 8fcc12977~ && ls -la --time-style=full-iso src/po/ru*po && git checkout master && ls -la --time-style=full-iso src/po/ru*po

Probably it would have been simpler to

git checkout <REF>
touch src/po/ru.po

to force the “newer” timestamp (for reproduction; your method is nice to explain how Git might leave the timestamps in an unexpected state).

@julio-b
Copy link
Contributor

julio-b commented Apr 11, 2024

Probably it would have been simpler to

git checkout <REF>
touch src/po/ru.po

to force the “newer” timestamp (for reproduction; your method is nice to explain how Git might leave the timestamps in an unexpected state).

Right, I wanted to show how this could be triggered just by using a simple git pull, without any special modification on the .po file itself. But for a fast reproduction a simple touch ru.po is all you need!

I debugged this a bit more and now i think know for sure what is happening:

  • At some point, one of those faulty .po files was generated with msgconv (instead of iconv)
  • msgconv, by default, will hard wrap lines at column 80
  • So now every time that the iconv rule above is triggered (because of the different timestamp that i explained) the output is not wrapped at column 80

You can verify that by using $ msgconv -t CP1251 -o /tmp/ru.c1251.po ru.po and compare it with the current version of this file $ diff /tmp/ru.c1251.po src/po/ru.c1251.po. Almost identical, minus a single line that is changed by a simple sed script.

Right now msgconv is used only on windows, eg. by src/po/Make_mvc.mak.

Thus there are three fixes required:

  1. Make the output of iconv and msconv equal. Probably by using msgconv --no-wrap
  2. Fix the files that were generated with msgconv
  3. And finally find a way for make to not trigger the iconv rule randomly. For gnu version of make this is done with .LOW_RESOLUTION_TIME, but i don't know if it is portable for other versions.

(3) is technically not required if (1,2) are fixed because the output will always be the same, but still worth fixing it so we don't waste time regenerating those files.

I hope this helps!

@k-takata
Copy link
Member

If make ru is executed in src/po/ and cleanup.vim is applied, ru.po will be wrapped in 80 columns.
Then, make ru.cp1251.po is executed, ru.cp1251.po will be also wrapped in 80 columns.

All of (or most of) the other po files are wrapped in 80 columns. So, I think ru*.po should be also wrapped in 80 columns.

@chrisbra
Copy link
Member

Hm, so we should add a check to check.vim.

Something like this:

" Check that all lines are no longer than 80 chars
let overlong = search('\%>80v', 'n')
if overlong > 0
  echomsg "Lines should be wrapped at 80 columns"
  if error == 0
    let error = overlong
  endif
endif

Interestingly, I checked this against de.po and got a match. So there is something I need to fix for German as well.

chrisbra added a commit that referenced this issue Apr 11, 2024
Problem:  Overlong translation files may cause repo to become "dirty"
Solution: Add a test into check.vim to warn for lines being longer than
          80 virt columns

related: #14490

Signed-off-by: Christian Brabandt <cb@256bit.org>
zeertzjq added a commit to zeertzjq/neovim that referenced this issue Apr 11, 2024
Overlong translation files may cause repo to become "dirty"

Problem:  Overlong translation files may cause repo to become "dirty"
Solution: Add a test into check.vim to warn for lines being longer than
          80 virt columns

related: vim/vim#14490

vim/vim@6f585da

Co-authored-by: Christian Brabandt <cb@256bit.org>
@RestorerZ
Copy link
Contributor

RestorerZ commented Apr 18, 2024

I have also been bothered by this and i ended up just compiling with --disable-nls.

By default, when Vim is built, PO files in the "src/po" directory are processed.

https://github.com/vim/vim/blob/133ed2a592e4d7998a8f6afa3de9f057feb23f0a/src/configure.ac#L4497C4-L4500C23

https://github.com/vim/vim/blob/1433ac93eb3acf28c55f0c4d775716ebee543160/src/Makefile#L2068C1-L2078C4

If you don't need this, you can use this command:

[./configure [switch] &&] make "MAKEMO="

This will cut through the PO file processing and does not require you to change
the "./configure" preset.

now make detects that ru.cp1251.po file is older and triggers the iconv rule here
So now every time that the iconv rule above is triggered (because of the different timestamp that i explained)

When checking on the host and stand systems, no big differences in time stamps
exceeding 2-second intervals were noticed. However, when the time difference was
close to 1-second in tenths of seconds, some files were converted on the stand
system. The situation was similar on the host system when the "/B" switch for
"nmake.exe" was used. Unfortunately, the exact relationship could not be
established. We can conclude that "make" uses a 1-second time interval to check
timestamps and when transferring from one system (host) to another (stand),
there is a rounding error.
To minimize these differences, the $(PS) $(PSFLAGS) Set-ItemProperty -Path '.\$@' -Name LastWriteTime -Value (Get-Date)
command was added to the "Make_mvc.mak" file, which is executed after the
PO-files conversion is completed.

If make ru is executed in src/po/ and cleanup.vim is applied, ru.po will be wrapped in 80 columns.

Unfortunately, that doesn't always happen.

The illustration below shows the file created by msgmerge --no-wrap before
cleanup.vim (bottom) and after (top).

cleanup

The diff seems to be a lot of formatting.

The difference in formatting between "ru.po" and "ru.cp1251.po" is probably
because I used "Make_mvc.mak" to prepare "ru.po" and my own makefile for
"ru.cp1251.po", where "msgconv" and "msgmerge" are set to text width.

All of (or most of) the other po files are wrapped in 80 columns.

Also, the check revealed that almost all PO files have different text
formatting. This also applies to the files "ru.po" and "ru.cp1251.po".

The illustrations below show the difference in formatting between the original
file (bottom) and the file (top) produced by "msgmerge" with --width=79,
--width=80 and without text width.

msgmerge --width=79
width79
msgmerge --witdth=80
width80
msgmerge
noswitch

The following was done to utilize uniform text formatting:

  • All existing PO files were reformatted to text width 79.
  • The modeline vim:tw=79:ts=8:ft=po: was added to the PO files
  • In Makefile and Make_mvc.mak, the switch --width=79 is set for "msgmerge"
    and "xgettex". And the same switch is set for "msgconv" in Make_mvc.mak.

iconv is not really consistent between systems, so the results you get may be different. I think it's behavior depends on LC_ environmental variables and there are different implementations for windows.

No influence on text formatting from the "iconv" program and its analog from
"powershell" commands was detected. The output text is in the same format as the
input.

Hopefully, these steps will help eliminate unplanned processing of PO files.

Right now msgconv is used only on windows

It might be worth using the "msgconv" tool in the Makefile too.

Pros:

  • "Native" tool for the gettext package;
  • Automatic setting of the correct encoding name in the "charset" field and
    avoiding additional processing with the "sed" tool;
  • Additional check of the correctness of the processed PO file;
  • Unified tool for all systems.

Cons:

  • Everyone is used to "iconv";
  • ???.

(The ability to use "iconv" if necessary will of course be retained)

What do you think about it, @k-takata , @chrisbra ?

@julio-b
Copy link
Contributor

julio-b commented Apr 19, 2024

iconv is not really consistent between systems, so the results you get may be different. I think it's behavior depends on LC_ environmental variables and there are different implementations for windows.

No influence on text formatting from the "iconv" program and its analog from "powershell" commands was detected. The output text is in the same format as the input.

Yeah sorry, i was wrong here. This was simply a wild guess I made because had no idea that windows makefiles may use msgconv.

We can conclude that "make" uses a 1-second time interval to check timestamps

This 1-second timestamp interval must be specific to your system and not how make implementations work generally. Maybe either nmake.exe or windows is doing something weird? I don't know. Usually make uses high precession timestamps to determine if a target is up to date and every nanosecond matters.

This excellent explanation by the maintainer of GNU make may clear up some confusion: https://lists.gnu.org/archive/html/help-make/2001-04/msg00047.html
Also this manual from autoconf might be useful: https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Timestamps-and-Make.html

@RestorerZ
Copy link
Contributor

@julio-b

This was simply a wild guess I made because had no idea that windows makefiles may use msgconv.

It needed to be verified one way or another and it's good that you brought it to
your attention.

This excellent explanation by the maintainer of GNU make may clear up some confusion: https://lists.gnu.org/archive/html/help-make/2001-04/msg00047.html
Also this manual from autoconf might be useful: https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Timestamps-and-Make.html

Thanks for the links to useful descriptions of how the make program works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants