Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows code page detection #475

Open
liegepr opened this issue Apr 24, 2024 · 2 comments
Open

Windows code page detection #475

liegepr opened this issue Apr 24, 2024 · 2 comments

Comments

@liegepr
Copy link

liegepr commented Apr 24, 2024

Thanks for developing radian.

I am running R v. 4.3.0 on, Windows 11. When using R term as interactive terminal in vscode, I am getting:

Sys.getlocale()
[1]"LC_COLLATE=French_France.utf8;LC_CTYPE=French_France.utf8;LC_MONETARY=French_France.utf8;LC_NUMERIC=C;LC_TIME=French_France.utf8"
l10n_info()$system.codepage
[1] 65001
l10n_info()$codepage
[1] 65001

Now when using radian:

sessionInfo()$locale
"LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"
l10n_info()$system.codepage
[1] 1252
l10n_info()$codepage
[1] 1252

After tweaking my .Rprofile, I can force R to use UTF-8 with radian:

sessionInfo()$locale
[1] "LC_COLLATE=fr_FR.UTF-8;LC_CTYPE=fr_FR.UTF-8;LC_MONETARY=fr-FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr-FR.UTF-8"

However, the R code page now conflicts with the Windows code page:

l10n_info()$system.codepage
[1] 1252
l10n_info()$codepage
[1] 65001

Starting from Windows 10 version 1803 and R v4.2, l10n_info()$system.codepage should report 65001.

The R-help page for ?Sys.setlocale says:
"From R 4.2, UCRT locale names should be used. The character set should match the system/ANSI codepage (l10n_info()$codepage be the same as l10n_info()$system.codepage). Setting it to any other value results in a warning and may cause encoding problems. As from R 4.2 on recent Windows the system codepage is 65001 and one should always use locale names ending with ".UTF-8" (except for "C" and ""), otherwise Windows may add a different character set."

@randy3k
Copy link
Owner

randy3k commented Apr 25, 2024

It is unfortunately due to lack of naive UTF-8 support for python (radian requires python in case you didn't know).
It seems that there is a way to change python manifest's activeCodePage to UTF-8 via mt.exe
python/cpython#86873 (comment)

@liegepr
Copy link
Author

liegepr commented Apr 30, 2024

Thanks for your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants