Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt-in to native UTF-8 support for OS interaction on Windows #156

Open
4 tasks
mosra opened this issue Nov 11, 2022 · 0 comments
Open
4 tasks

Opt-in to native UTF-8 support for OS interaction on Windows #156

mosra opened this issue Nov 11, 2022 · 0 comments
Projects

Comments

@mosra
Copy link
Owner

mosra commented Nov 11, 2022

From discussion with @williamjcm and @sthalik on Gitter. To avoid UTF-16 conversion in every Utility::Path API that deals with the filesystem on Windows (and other areas, like environment access), we could pass UTF-8 support directly into the A APIs. It's an opt-in feature, and there's three ways to achieve this:

  • Changing the global code-page in Windows settings. Requires user interaction, so not a viable option.
  • Linking to UCRT instead of MSVCRT and calling setlocale(LC_ALL, ".utf-8"). Available since Windows 10 SDK 1803.
  • (Likely also linking to UCRT) and adding an entry to the app manifest. Requires Windows 10 SDK 1903+.

The second option could be done inside CorradeMain, the third variant documented alongside HiDPI support, for example. Then, all Path APIs would check the prerequisites (Windows version, UCRT vs MSVCRT, and if the codepage is set to UTF-8) and pick a more optimal path in that case.

TODOs left:

  • Figure out a way how to robustly check that we can use UTF-8. Is UTF-8 codepage presence enough (checked with setlocale(LC_ALL, nullptr))?. Or do we also need to check for Windows version and/or UCRT presence?

  • Figure out a way how to check just once and store it in some global variable instead of doing the check again in every Path API, without running into thread safety, thread-local variables, duplicated globals and other nasty issues in yet another place.

    • Though some rough 3rd party code could setlocale() on its own and break it, so there's probably no way around checking every time :/
  • The *A APIs still have the MAX_PATH limitation, and it's apparently impossible to work around that:

    In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend \\?\ to the path. For more information, see Naming Files, Paths, and Namespaces.

    Which makes this whole effort rather useless. But maybe there's other ways how to circumvent this?

    • Maybe it could fall back to the *W APIs if the input UTF-8 path is longer than MAX_PATH? That could make it work for 90% of use cases, OTOH it means we have to explicitly test each and every Path API to handle this well. Though since we have to have that fallback for when the locale changes again (as noted above) anyway, it shouldn't mean that much extra code.
  • Setting the code page to UTF-8 may be considered "not nice" to 3rd party libraries that still rely on *A APIs. Consider if a compile-time opt-out for this feature is enough or if it should be opt-in (for example to be enabled by the users if they know it won't break 3rd party stuff).

    • Or, possibly, don't set anything but use UTF-8 if the codepage is discovered to be UTF-8? Seems like the least intrusive option, but still without falling back to UTF-16 conversion.
@mosra mosra added this to TODO in Utility via automation Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Utility
  
TODO
Development

No branches or pull requests

1 participant