Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double quote in data makes parsing exit early without error #236

Open
1mike12 opened this issue Feb 22, 2024 · 0 comments
Open

double quote in data makes parsing exit early without error #236

1mike12 opened this issue Feb 22, 2024 · 0 comments

Comments

@1mike12
Copy link

1mike12 commented Feb 22, 2024

  • Operating System: MacOS
  • Node Version: 20
  • NPM Version: 9.6.6
  • csv-parser Version: ^3.0.0

Expected Behavior

parse a file even if data has a double quote, or at least produce error

Actual Behavior

silently quits file early

How Do We Reproduce?

I kept getting less rows streamed than I expected from a file located at https://download.geonames.org/export/dump/admin2Codes.txt
This is a tab delimited file of 45,784 rows

I realized that it was because one of the entries has a double quote
RU.45.517838 Novotor”yal’skiy Rayon Novotor"yal'skiy Rayon 517838
, which if I delete it works properly
RU.45.517838 Novotor[DELETED]yal’skiy Rayon Novotor"yal'skiy Rayon 517838

import {Writable} from "node:stream";
import csvParser from "csv-parser";
import {Transform} from "stream";
import https from "https";

const repro = async () => {

  let lineCount = 0
  return new Promise<void>((resolve, reject) => {

    https.get("https://download.geonames.org/export/dump/admin2Codes.txt", (response) => {
      response
        .pipe(csvParser({separator: "\t", headers: ["id", "name", "nameAscii", "geonameId"]}))
        .pipe(new Transform({
          objectMode: true,
          transform(chunk, encoding, callback) {
            lineCount++
            this.push(chunk);
            callback();
          },
        }))
        .pipe(new Writable({
          objectMode: true,
          write(chunk, encoding, callback) {
            callback();
          }
        }))
        .on('finish', () => {
          console.log("total lines should be ~45k", lineCount)
          resolve()
        })
        .on('error', reject)
    }).on('error', reject)
  })
}

(async () => {
  await repro()
})()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant