Skip to content

Developer Windows Unicode and Messages

Michael edited this page Jan 24, 2019 · 6 revisions

winconsole - aka openscad.com

Says TODO: Fix printing of unicode on console.

No unicode. There is specific comment NOT to use fprintf. It outputs to std_err:

https://github.com/openscad/openscad/blob/c6a485651fa29f1a878ea454558192492f7467ec/winconsole/winconsole.c#L56

HANDLE hError = GetStdHandle(STD_ERROR_HANDLE);
...
WriteFile(hError, msg, strlen(msg), NULL, NULL);

winconsole starts openscad.exe with:

CreateProcessW(NULL, cmd, NULL, NULL, TRUE,CREATE_UNICODE_ENVIRONMENT, NULL, NULL,&info.startupInfo, &info.processInfo)

CREATE_UNICODE_ENVIRONMENT: If this flag is set, the environment block pointed to by lpEnvironment uses Unicode characters.

Question: Does this make opensad.exe stdout & stderr output Unicode?

openscad.exe - Command Line

blah

openscad.exe - GUI

Blah blah blacksheep

openscad in general

Uses std::wstring with Windows calls, only uses fs::path p(path);

https://github.com/openscad/openscad/blob/ed93d3fec5d11bedc06f5200a3a740a72c7b6c56/src/PlatformUtils-win.cc#L27

// convert from windows api w_char strings (usually utf16) to utf8 std::string
std::string winapi_wstr_to_utf8( std::wstring wstr )

Is not called anywhere. ??

Boost

Note for Cygwin users. Cygwin version 1.7 or later is required because only versions of GCC with wide character strings are supported. The library's implementation code treats Cygwin as a Windows platform, and thus uses the Windows API and uses Windows path syntax as the native path syntax.

Filesystem {right version?}

Class path is critical.

Need to understand Generic pathname format v's Native pathname format, but: (note ISO/IEC 9945 = POSIX = *nix - generally)

[Note: No conversion occurs on ISO/IEC 9945 and Windows since they have native formats that conform to the generic format. --end note] Native pathname format: An implementation defined format. [Note: For ISO/IEC 9945 compliant operating systems, the native format is the same as the generic format. For one widely used non-ISO/IEC 9945 compliant operating system, the native format is similar to the generic format, but the directory-separator characters can be either slashes or backslashes. --end note]

(widely used non-ISO/IEC 9945 compliant -> Windows) These formats are the 'style' of a directory structure (eg POSIX v's z/VM) , NOT Unicode/ASCII. So ignore 'path Conversions'

An object of class path represents a path, and contains a pathname

Path: A sequence of elements that identify the location of a file within a filesystem. The elements are the root-nameopt, root-directoryopt, and an optional sequence of filenames. [Note: A pathname is the concrete representation of a path. --end note]

Pathname: A character string that represents a path. Pathnames are formatted according to the generic pathname format or an implementation defined native pathname format.

Types

typedef value_type; // char for ISO/IEC 9945, wchar_t for Windows

value_type is an implementation-defined typedef for the character type used by the operating system to represent pathnames. Member functions described as returning const string, const wstring, const u16string, or const u32string are permitted to return const string&, const wstring&, const u16string&, or const u32string&, respectively.

Remarks: If the value type ... is not value_type, conversion is performed by cvt.

So boost path is wchar_t based on Windows. AKA wide, UTF-16.

Qt

Windows

Windows is UTS-16, re typedefs & functions READ THIS! it explains wchar_t, TCHAR, WCHAR, TEXT(), L'...', L"...", function calls ending in W, eg GetFullPathNameW

On newer file systems, such as NTFS, exFAT, UDFS, and FAT32, Windows stores the long file names on disk in Unicode, which means that the original long file name is always preserved. This is true even if a long file name contains extended characters, regardless of the code page that is active during a disk read or write operation. (https://docs.microsoft.com/en-au/windows/desktop/FileIO/naming-a-file)

Clone this wiki locally