Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when writing EOL in stream mode #47

Open
loretoparisi opened this issue May 4, 2017 · 5 comments
Open

Issue when writing EOL in stream mode #47

loretoparisi opened this issue May 4, 2017 · 5 comments

Comments

@loretoparisi
Copy link

I have an issue when writing a file in stream mode. The EOLseems not to be respected when writing a new line.

This is the sync version, that works as expected.

var i = fs.openSync(self._options.trainFile, 'r');
        var o = fs.openSync(tmpFilePath, 'w');

        var buf = new Buffer(1024 * 1024), len, prev = '';

        while(len = fs.readSync(i, buf, 0, buf.length)) {

            var a = (prev + buf.toString('utf-8', 0, len)).split('\n');
            prev = len === buf.length ? '\n' + a.splice(a.length - 1)[0] : '';
            var out = '';
            a.forEach(function(text) {
                if(!text) return;
                text=text.toLowerCase()
                .replace(/^/gm, '__label__')
                .replace(/'/g, " ' ")
                .replace(/"/g, '')
                .replace(/\./g, ' \. ')
                .replace(/,/g, ' \, ')
                .replace(/\(/g, ' ( ')
                .replace(/\)/g, ' ) ')
                .replace(/!/g, ' ! ')
                .replace(/\?/g, ' ! ')
                .replace(/;/g, ' ')
                .replace(/:/g, ' ')
                out += text + '\n';
            });
            var bout = new Buffer(out, 'utf-8');
            fs.writeSync(o, bout, 0, bout.length);
        }

        fs.closeSync(o);
        fs.closeSync(i);

while this is the stream mode with byline

var os= require("os");
        var Transform = require('stream').Transform
        var writeStream = fs.createWriteStream(tmpFilePath, {flags: 'w', encoding: 'utf-8'});
        var stream = byline(fs.createReadStream(self._options.trainFile, { flags: 'r', encoding: 'utf8'}));
        //stream.pipe(writeStream);
        stream.on('end', function() {
            return resolve({
                trainFile: tmpFilePath
            });
        });
        stream.on('data', function(text) { /// read line by line
            text=text.toLowerCase()
            .replace(/^/gm, '__label__')
            .replace(/'/g, " ' ")
            .replace(/"/g, '')
            .replace(/\./g, ' \. ')
            .replace(/,/g, ' \, ')
            .replace(/\(/g, ' ( ')
            .replace(/\)/g, ' ) ')
            .replace(/!/g, ' ! ')
            .replace(/\?/g, ' ! ')
            .replace(/;/g, ' ')
            .replace(/:/g, ' ');
            writeStream.write(text + os.EOL);
        });
@jahewson
Copy link
Owner

jahewson commented May 6, 2017

I don't see what this has to do with byline. You're saying that:

var writeStream = fs.createWriteStream(tmpFilePath, {flags: 'w', encoding: 'utf-8'});

followed by:

writeStream.write(text + os.EOL);

Results in a file without EOL characters. Byline plays no role in that.

@jahewson
Copy link
Owner

jahewson commented May 6, 2017

Also you have an inconsistency in your code for the encodings 'utf8' vs 'utf-8'.

@loretoparisi
Copy link
Author

@jahewson sorry, does byline read the input line in

    var stream = byline(fs.createReadStream(self._options.trainFile, { flags: 'r', encoding: 'utf8'}));

So I would expect to write down that line so

stream.on('data', function(text) { /// read line by line
            text=text.toLowerCase()
            .replace(/^/gm, '__label__')
            .replace(/'/g, " ' ")
            .replace(/"/g, '')
            .replace(/\./g, ' \. ')
            .replace(/,/g, ' \, ')
            .replace(/\(/g, ' ( ')
            .replace(/\)/g, ' ) ')
            .replace(/!/g, ' ! ')
            .replace(/\?/g, ' ! ')
            .replace(/;/g, ' ')
            .replace(/:/g, ' ');
            writeStream.write(text  + '\n');
        });

but this is out writing down a new line...

@jahewson
Copy link
Owner

jahewson commented May 8, 2017

Ok, so does this:

// ...

stream.on('data', function(text) { /// read line by line
  console.log(text)
});

Print only one line to the console?

@loretoparisi
Copy link
Author

@jahewson lines are printed out in the console, but when I write to writeStream.write(text + '\n'); I will get in the output stream one single line, without any \n appended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants