-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
ISO-8859-1 is currently (alpha16) the default character encoding, as well as when explicit encoding specifiers Default and OEM are used - see here.
This choice is problematic, because ISO-8859-1 is a subset of the commonly used Windows-1252 encoding.
(The two encodings are often conflated, but they are not the same.)
Specifically, using ISO-8859-1 makes the following characters - the printable characters in the codepoint range 0x80 - 0x9F - unavailable:
€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ
Note that the € character is part of that list.
You can verify the problematic behavior as follows:
> '€' | Set-Content tmp.txt; Get-Content tmp.txt
?
Because € cannot be represented in ISO-8859-1, it was quietly converted to a literal ?.
Contrast this with use of Windows-1252:
> $enc1252 = [System.Text.CodePagesEncodingProvider]::Instance.GetEncoding(1252); [IO.File]::WriteAllText('tmp.txt', '€', $enc1252); [IO.File]::ReadAllText('tmp.txt', $enc1252)
€
The € char. - codepoint 0x80 in Windows-1252 (but not ISO-8859-1) - was correctly preserved.
Also, please note that in order to fully emulate Windows PowerShell behavior, using a fixed encoding in Core is not sufficient.
Instead, the encoding would have to be locale-dependent, as on Windows:
Unix locales would have to be mapped to the Windows legacy codepages - see here.