Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support paths with UTF characters on Windows #681

Open
carlescufi opened this issue Jun 2, 2023 · 9 comments
Open

Support paths with UTF characters on Windows #681

carlescufi opened this issue Jun 2, 2023 · 9 comments
Assignees
Labels
enhancement platform: Windows Issues related to Zephyr SDK on Windows hosts

Comments

@carlescufi
Copy link
Member

carlescufi commented Jun 2, 2023

C:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr>C:/Users/carles/bin/zephyr-sdk-0.16.1/aarch64-zephyr-elf/bin/aarch64-zephyr-elf-gcc.exe c:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr\kernel\banner.c
cc1.exe: fatal error: c:\Users\carles\src\tmp\????\zephyr\zephyr\kernel\banner.c: Invalid argument
compilation terminated.

Note that GNU Arm Embedded doesn't work either:

C:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr>"c:\Users\carles\bin\gnuarmemb\10 2021.10\bin\arm-none-eabi-gcc.exe" c:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr\kernel\banner.c
arm-none-eabi-gcc.exe: error: c:\Users\carles\src\tmp\????\zephyr\zephyr\kernel\banner.c: Invalid argument
arm-none-eabi-gcc.exe: fatal error: no input files
compilation terminated.
@carlescufi carlescufi changed the title utf-8 characters not working on Windows Paths with utf-8 characters not working on Windows Jun 2, 2023
@carlescufi carlescufi changed the title Paths with utf-8 characters not working on Windows Paths with UTF characters not working on Windows Jun 2, 2023
@stephanosio stephanosio self-assigned this Jun 13, 2023
@stephanosio stephanosio added platform: Windows Issues related to Zephyr SDK on Windows hosts enhancement labels Jun 13, 2023
@stephanosio stephanosio changed the title Paths with UTF characters not working on Windows Support paths with UTF characters on Windows Jun 13, 2023
@stephanosio
Copy link
Member

Marking this as an enhancement since this is more of a general problem with MinGW/MSVCRT.

From https://www.msys2.org/docs/environments/:

MSVCRT (Microsoft Visual C++ Runtime) is available by default on all Microsoft Windows versions, but due to backwards compatibility issues is stuck in the past, not C99 compatible and is missing some features.
...

  • It doesn't support the UTF-8 locale

Also from https://blog.r-project.org/2022/11/07/issues-while-switching-r-to-utf-8-and-ucrt-on-windows/#why-utf-8-via-ucrt:

MSVCRT does not allow UTF-8 to be the encoding of the C runtime (as reported by setlocale() function and used by standard C functions). Applications linked to MSVCRT, in order to support Unicode, hence have either to use Windows-specific UTF-16LE API for anything that involves strings, or some third-party library, such as ICU.

The easiest way to fix this (i.e. without modifying the Binutils and GCC themselves to use the Windows UTF-16LE API) would be to build the Windows Zephyr SDK binaries against the UCRT instead of the MSVCRT; but, this requires more investigation and discussion on the potential side effects.

@piernov
Copy link

piernov commented Jan 31, 2024

Could an alternative build linked with UCRT be provided? Currently, building Zephyr from a user directory name containing Unicode characters on Windows is broken due to this issue (among others), since absolute paths are used almost everywhere in Zephyr's build system.

As for the GNU Arm Embedded toolchain, it seems like the new ARM GNU Toolchain might have fixed this problem.

As a side note, in my case the issue is that GCC preprocessor produces files with include paths in an "ANSI" character set instead of UTF-8. When the path contains characters that can be converted from Unicode to the compatibility "ANSI" character set (e.g., é in ISO-8859), GCC can actually read input files properly but does not re-encode the paths to UTF-8 it seems. This causes the zephyr.dts.pre file to contain paths in an "ANSI" character set with characters outside the 7-bit ASCII table, which cannot be decoded in UTF-8. However, the python-devicetree library of Zephyr tries to read the file as UTF-8 and fails. A workaround is to call the preprocessor with the -P arguments to omit the paths from the output file.

@dpkristensen
Copy link

dpkristensen commented Feb 5, 2024

I would like to point out that not supporting UNICODE in Windows API also leads to issues with path length restrictions. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

The functions that are affected by setting the registry value to override the length limitation are ONLY the wide versions, the narrow character versions still have the max length restriction of 260 and there is no way to change it.

This has led to issues with path restrictions on Windows PCs that are not present on other systems because GCC can't open a file with a long path name.

So I would say this is more akin to a bug than an Enhancement.

@stephanosio
Copy link
Member

Note that the path length limitation issue will not be fully solved even if we make the SDK use the Unicode functions, because there are other components in the build system (notoriously, Ninja) that does not support long paths due to the very same underlying problem.

@dpkristensen
Copy link

Right but as long as the SDK is in a path accessible by Ninja, it will have no problem launching GCC. The source is passed as a string, so it will still work if only the source is in a long path. Some of the files generated by the build system have very long paths due to being added as relative to the build directory.

If GCC is able to accept such a path, then it would work in a lot of places. If there's an issue with CMake or Ninja, then maybe they should be built with UNICODE as well; but it shouldn't stop this from being supported here.

@piernov
Copy link

piernov commented Apr 11, 2024

Turns out the issue wasn't just about UCRT, but also from the way the command line arguments are passed to the main function of GCC as explained there https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 .
This has been fixed by the following commits now available in GCC 13.1:

Additionally, in order for the build with UCRT to be successful, this commit is also required (also available in GCC 13.1):

UCRT might not even be needed at all, not sure.

Zephyr's GCC hasn't been ported to GCC 13.1 yet sadly, so I merged these commits on top of Zephyr's GCC fork for SDK v0.16.5-1: https://github.com/piernov/gcc/commits/fix/ucrt-utf8/

I then rebuilt a MinGW-W64 toolchain with UCRT and with win32 threads (instead of posix/winpthreads) on ArchLinux:

  • mingw-w64-headers: configure […] --with-default-msvcrt=ucrt
  • mingw-w64-crt: configure […] --with-default-msvcrt=ucrt
  • mingw-w64-gcc: configure […] --enable-threads=win32 --disable-libgomp --disable-libssp
    This required rebuilding and reinstalling the packages roughly in that order: mingw-w64-headers mingw-w64-crt mingw-w64-gcc mingw-w64-crt mingw-w64-gcc (yes they need to be rebuilt twice, maybe more).

The next release of MinGW should default to UCRT: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/82b8edc101d7f8fefd44e84d2e24a6edd01901f9/ . However, sdk-ng's uses Ubuntu 20.04 packages, so it may take a very long time until with see a UCRT build used by default. I hope there is a way the process can be sped up.
Using win32 threads instead of posix threads is in order to avoid the dependency on the external winpthreads library, like how the official SDK toolchains are built.

Finally I rebuilt Zephyr's SDK toolchain for ARM, and I obtained a GCC compiler that can take UTF-8 paths on the command line, and generates preprocessed files with UTF-8 paths as well.

I uploaded my build there: https://github.com/piernov/sdk-ng/releases/download/v0.16.5-1-ucrt-utf8/toolchain_windows-x86_64_arm-zephyr-eabi.7z

Lastly, in order to support running the toolchain from a path with Unicode characters in the Zephyr build system, I had to add ENCODING UTF-8 to the execute_process() call that sets LIBGCC_FILE_NAME in https://github.com/zephyrproject-rtos/zephyr/blob/f0212367dc033d152b1d3f08d0efc130400034dd/cmake/compiler/gcc/target.cmake#L100 .

@dpkristensen
Copy link

dpkristensen commented Apr 11, 2024

For this particular issue, supporting UTF-8 encoded paths may fix the problem mentioned; but it still does not solve the issue of Windows API path length being different based on the setting of UNICODE, which affects the Windows API call (e.g., CreateFileW vs CreateFileA).

If the arguments are passed in a separate file to GCC, then that would allow bypassing the path length restriction in a greater number of cases.

@piernov
Copy link

piernov commented Apr 11, 2024

Does UCRT solve that or not? Can you confirm your bug still exists in my build?
Since the issue for the original bug report (Unicode characters in path) is already fixed in upstream GCC, it just needs to trickle down to Zephyr's branch, if your issue isn't solved the same way I'd suggest opening another bug report.

@dpkristensen
Copy link

Yes, it is technically a separate issue. I have solved it locally by building from a shorter path on my local filesystem, so it's not a blocker. Maybe a "nice-to-have".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement platform: Windows Issues related to Zephyr SDK on Windows hosts
Projects
None yet
Development

No branches or pull requests

4 participants