Skip to content

Default and OEM character encodings in the Core edition should be Windows-1252, not ISO-8859-1 #3258

@mklement0

Description

@mklement0

ISO-8859-1 is currently (alpha16) the default character encoding, as well as when explicit encoding specifiers Default and OEM are used - see here.

This choice is problematic, because ISO-8859-1 is a subset of the commonly used Windows-1252 encoding.
(The two encodings are often conflated, but they are not the same.)

Specifically, using ISO-8859-1 makes the following characters - the printable characters in the codepoint range 0x80 - 0x9F - unavailable:

€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ

Note that the character is part of that list.

You can verify the problematic behavior as follows:

> '€' | Set-Content tmp.txt; Get-Content tmp.txt
?

Because cannot be represented in ISO-8859-1, it was quietly converted to a literal ?.

Contrast this with use of Windows-1252:

> $enc1252 = [System.Text.CodePagesEncodingProvider]::Instance.GetEncoding(1252); [IO.File]::WriteAllText('tmp.txt', '€', $enc1252); [IO.File]::ReadAllText('tmp.txt', $enc1252)
€

The char. - codepoint 0x80 in Windows-1252 (but not ISO-8859-1) - was correctly preserved.


Also, please note that in order to fully emulate Windows PowerShell behavior, using a fixed encoding in Core is not sufficient.

Instead, the encoding would have to be locale-dependent, as on Windows:
Unix locales would have to be mapped to the Windows legacy codepages - see here.

Metadata

Metadata

Assignees

Labels

Resolution-DuplicateThe issue is a duplicate.WG-Enginecore PowerShell engine, interpreter, and runtime

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions