Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO 2022 JIS Japanese encoding fails #17

Open
n1474335 opened this issue Nov 1, 2019 · 2 comments
Open

ISO 2022 JIS Japanese encoding fails #17

n1474335 opened this issue Nov 1, 2019 · 2 comments

Comments

@n1474335
Copy link

n1474335 commented Nov 1, 2019

Hi, thanks very much for your work on this repository, it's incredibly useful. We use it as the main character encoding library for CyberChef.

We've recently noticed an issue when trying to encode into ISO 2022 JIS Japanese where only null bytes are returned.

The affected CP numbers are 50220, 50221 and 50222.

Example code

import cptable from "codepage";

cptable.utils.encode(50220, "こんにちは");

Expected output

Uint8Array(10) [164, 179, 164, 243, 164, 203, 164, 193, 164, 207]

Actual output

Uint8Array(5) [0, 0, 0, 0, 0]

Can you shed any light on this behaviour?

@n1474335
Copy link
Author

n1474335 commented Nov 1, 2019

Another example that also fails:

Code

import cptable from "codepage";

cptable.utils.encode(50220, "ーム")

Expected output

Uint8Array(10) [27, 36, 66, 33, 60, 37, 96, 27, 40, 66]

Actual output

Uint8Array(2) [0, 0]

@SheetJSDev
Copy link
Contributor

Thanks for sharing! The ISO 2022 codepages 5022{0,1,2,5,7} are definitely incorrect -- hiragana require a control sequence and those are not currently supported. Based on ECMA-35, the first kana "こ" should be encoded as 1B 24 42 24 33 (1B 24 42 to switch to the JIS double byte encoding, 24 for the Hiragana subset and 43 for the actual character). This will require a direct implementation of control sequences and a new set of LUTs for the various character subsets.

PS: All of the generated codepages with source listed as "Windows 7" are assumed to either be single-byte or double-byte. Clearly that wasn't the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants