ENH: Support for multiple comment characters with readers #8727

ebran · 2014-11-04T16:19:30Z

I would be very pleased if Pandas supported multiple comment characters when reading data from files. According to:

import pandas as pd
df = pd.read_table("data.dat", comment=("#","@"), delim_whitespace=True)

I don't know if this is requires a minor or major implementation effort?

Best,
Erik

The text was updated successfully, but these errors were encountered:

jreback · 2014-11-04T21:34:52Z

this would be a bit of an effort. the reader is basically byte by byte (with some backref capability). So it would have to check agains a buffer of the comment chars (it just checks against the single char now, but only if its not NULL), in a performant manner. Could be done.

dansondergaard · 2016-11-22T07:29:32Z

Would be great if a comment character could actually also be two characters, e.g. "##". For example, in VCF files, some meta data is specified at the beginning of the file with "##" before the actual table starts:

http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

Often one just wants to ignore these, but:

df = pd.read_csv("data.vcf", comment="##")

doesn't work. Note that for VCF it won't work to just use comment="#" since the header line actually starts with a single "#".

wesm · 2018-07-06T21:53:45Z

This would be difficult. I'm closing this for now

loodvn · 2022-09-16T21:57:11Z

I ran into this feature request when reading a VCF file as the others did.

## Metadata
## ...
# Header
col1, col2, etc

Is the cparser the difficult part of this change?

loodvn · 2022-09-18T05:55:23Z

Starting a draft PR at #48615, I imagine getting the look-ahead into the state machine of C might be a bit tricky but I'll try give it a go if necessary. Also will check benchmarking for any regressions

jreback added IO CSV read_csv, to_csv Enhancement labels Nov 4, 2014

jreback added this to the Someday milestone Nov 4, 2014

Socob mentioned this issue Mar 16, 2018

Using commented header line (flat file readers) #20378

Closed

jreback added Difficulty Advanced labels Jul 4, 2018

jreback mentioned this issue Jul 4, 2018

Support multiple characters for comment in read_csv #21725

Closed

wesm added the Won't Fix label Jul 6, 2018

wesm closed this as completed Jul 6, 2018

loodvn mentioned this issue Sep 18, 2022

WIP: Multichar comment #48615

Closed

5 tasks

Wainberg mentioned this issue Aug 18, 2023

Support multiple and/or multi-character and/or regex comment_char in read_csv() pola-rs/polars#10583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support for multiple comment characters with readers #8727

ENH: Support for multiple comment characters with readers #8727

ebran commented Nov 4, 2014

jreback commented Nov 4, 2014

dansondergaard commented Nov 22, 2016

wesm commented Jul 6, 2018

loodvn commented Sep 16, 2022

loodvn commented Sep 18, 2022

ENH: Support for multiple comment characters with readers #8727

ENH: Support for multiple comment characters with readers #8727

Comments

ebran commented Nov 4, 2014

jreback commented Nov 4, 2014

dansondergaard commented Nov 22, 2016

wesm commented Jul 6, 2018

loodvn commented Sep 16, 2022

loodvn commented Sep 18, 2022