Calling toString() on potentially incomplete buffers #45

matthiasg · 2017-03-31T12:24:26Z

It seems byline is just calling toString() on buffers without correctly respecting unicode encoding rules. Since the buffers going into transform can be on any arbitrary position inside the original byte stream it is possible for it to be in the middle of a character.

This would potentially influence all characters encoded with more than 1 bytes.

if (encoding == 'buffer') {
  chunk = chunk.toString(); // utf8
  encoding = 'utf8';
}

Is this handled somewhere else ?

The text was updated successfully, but these errors were encountered:

LeonFedotov · 2017-04-23T15:14:40Z

i just got an error like that with some Chinese characters where I didn't get the whole json that is present as expected, @matthiasg do you know how can I correct this issue?

matthiasg · 2017-04-23T19:04:58Z

@LeonFedotov you could use the stringdecoder as built into node directly. it can be continuously fed with new bytes and will emit proper unicode parsed strings only. each string can then be fed into byline (or just do it yourself).

pludov · 2018-08-07T14:39:49Z

As a workaround, for file streams, you can set the source stream encoding:
const stream = byline(fs.createReadStream(filename, {encoding: 'utf-8'}))

jahewson added the Bug label May 7, 2017

lfdoherty mentioned this issue Oct 15, 2017

fixed bug with unicode multibyte characters via StringDecoder #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling toString() on potentially incomplete buffers #45

Calling toString() on potentially incomplete buffers #45

matthiasg commented Mar 31, 2017

LeonFedotov commented Apr 23, 2017

matthiasg commented Apr 23, 2017

pludov commented Aug 7, 2018

Calling toString() on potentially incomplete buffers #45

Calling toString() on potentially incomplete buffers #45

Comments

matthiasg commented Mar 31, 2017

LeonFedotov commented Apr 23, 2017

matthiasg commented Apr 23, 2017

pludov commented Aug 7, 2018