Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanf doesn't support UTF-8 on Windows #354

Open
vtereshkov opened this issue Feb 24, 2024 · 5 comments
Open

scanf doesn't support UTF-8 on Windows #354

vtereshkov opened this issue Feb 24, 2024 · 5 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@vtereshkov
Copy link
Owner

vtereshkov commented Feb 24, 2024

The C runtime on Windows supports UTF-8 in printf(), but not in scanf():

scanf doesn't currently support input from a UNICODE stream.

(https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/scanf-scanf-l-wscanf-wscanf-l?view=msvc-170)

printf() example:

fn main() {
   s := "Привет, мир!"
   printf("%s\n", s)
}
C:\Users\vtere\Desktop\umka-lang\umka_windows_mingw>umka.exe ..\test.um
╨Я╤А╨╕╨▓╨╡╤В, ╨╝╨╕╤А!

Split from #108.

@vtereshkov vtereshkov added bug Something isn't working enhancement New feature or request labels Feb 24, 2024
@vtereshkov vtereshkov changed the title printf/scanf on Windows don't support UTF-8 printf/scanf don't support UTF-8 on Windows Feb 24, 2024
@luauser32167
Copy link
Contributor

C:\Users\vtere\Desktop\umka-lang\umka_windows_mingw>umka.exe ..\test.um
╨Я╤А╨╕╨▓╨╡╤В, ╨╝╨╕╤А!

I think you need to change the codepage to utf8 (65001)

> chcp 65001
> C:\Users\vtere\Desktop\umka-lang\umka_windows_mingw>umka.exe ..\test.um

You can also do this with the windows function SetConsoleOutputCP(65001).

Here is some info on windows utf8 from null program (Chris Wellons): 1, 2

@vtereshkov
Copy link
Owner Author

Fixed printf() with SetConsoleOutputCP(CP_UTF8). It still fails on Chinese characters, but at least prints Cyrillic correctly.

@vtereshkov vtereshkov changed the title printf/scanf don't support UTF-8 on Windows scanf doesn't support UTF-8 on Windows Feb 25, 2024
@vtereshkov
Copy link
Owner Author

vtereshkov commented Feb 25, 2024

I fixed Chinese by changing the console font to SimSun-ExtB. Cyrillic characters look weird after that, but it's a decades-old font rendering problem:

image

@skejeton
Copy link
Contributor

skejeton commented Mar 13, 2024

@vtereshkov we need to be careful with Windows state, Umka may very well be embedded into an application that uses a different encoding, SetConsoleOutputCP(CP_UTF8) works globally.

I think the best solution is to just bite the bullet, and use the widechar functions and convert them to UTF-8 back and forth.

@vtereshkov
Copy link
Owner Author

@skejeton Perhaps you're right, but the benefit-to-cost ratio is low, so I'm reluctant to do anything with it now.

@vtereshkov vtereshkov added bug Something isn't working and removed bug Something isn't working labels Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants