Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling toString() on potentially incomplete buffers #45

Open
matthiasg opened this issue Mar 31, 2017 · 3 comments
Open

Calling toString() on potentially incomplete buffers #45

matthiasg opened this issue Mar 31, 2017 · 3 comments
Labels

Comments

@matthiasg
Copy link

It seems byline is just calling toString() on buffers without correctly respecting unicode encoding rules. Since the buffers going into transform can be on any arbitrary position inside the original byte stream it is possible for it to be in the middle of a character.

This would potentially influence all characters encoded with more than 1 bytes.

if (encoding == 'buffer') {
  chunk = chunk.toString(); // utf8
  encoding = 'utf8';
}

Is this handled somewhere else ?

@LeonFedotov
Copy link

i just got an error like that with some Chinese characters where I didn't get the whole json that is present as expected, @matthiasg do you know how can I correct this issue?

@matthiasg
Copy link
Author

@LeonFedotov you could use the stringdecoder as built into node directly. it can be continuously fed with new bytes and will emit proper unicode parsed strings only. each string can then be fed into byline (or just do it yourself).

@pludov
Copy link

pludov commented Aug 7, 2018

As a workaround, for file streams, you can set the source stream encoding:
const stream = byline(fs.createReadStream(filename, {encoding: 'utf-8'}))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants