Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper encoding / decoding of some special 7-bit values in cp437, macintosh #257

Open
rossj opened this issue Aug 6, 2020 · 3 comments

Comments

@rossj
Copy link

rossj commented Aug 6, 2020

Hi there.

I've noticed that cp437 does not properly encode / decode special symbols that are assigned to bytes 0x01-0x1F and 0x7F. Instead, When decoding, these bytes are incorrectly treated as-is and passed through as control characters. Similarly, when encoding the special characters in this range, they are replaced with question marks.

I've noticed a similar issue with the macintosh encoding, which has special symbols defined at x11-x14.

As an example, the two tests below are currently failing:

import { decode, encode } from 'iconv-lite';

describe('encodings', () => {
    it('should encode special cp437 symbols that map to bytes 0x0-0x1F', () => {
        const input = '\u263A'; // A smiley face
        const result = encode(input, 'cp437');
        expect(result[0]).toEqual(1);
    });

    it('should decode cp437 bytes in range 0x01-0x1F', () => {
        const input = Buffer.from([1]);
        const result = decode(input, 'cp437');
        expect(result).toEqual('\u263A');
    });
});
@ashtuchkin
Copy link
Owner

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.

@NuSkooler
Copy link

Came here to log exactly this. Any ETA? This would help a lot with enigma-bbs as well as a text mode RPG I'm working on!

@yosion-p
Copy link
Contributor

yosion-p commented Aug 23, 2021

I had a double check, seems the issue exist indeed. I checked the source code, and found cp437 was achieved by remote resource, but i guess the remote resource lack of partial data. how about we make special treatment for these special characters?

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants