New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support for multiple comment characters with readers #8727
Comments
this would be a bit of an effort. the reader is basically byte by byte (with some backref capability). So it would have to check agains a buffer of the comment chars (it just checks against the single char now, but only if its not NULL), in a performant manner. Could be done. |
Related: Would be great if a comment character could actually also be two characters, e.g. "##". For example, in VCF files, some meta data is specified at the beginning of the file with "##" before the actual table starts: http://www.internationalgenome.org/wiki/Analysis/vcf4.0/ Often one just wants to ignore these, but: df = pd.read_csv("data.vcf", comment="##") doesn't work. Note that for VCF it won't work to just use |
This would be difficult. I'm closing this for now |
I ran into this feature request when reading a VCF file as the others did.
Is the cparser the difficult part of this change? |
Starting a draft PR at #48615, I imagine getting the look-ahead into the state machine of C might be a bit tricky but I'll try give it a go if necessary. Also will check benchmarking for any regressions |
I would be very pleased if Pandas supported multiple comment characters when reading data from files. According to:
I don't know if this is requires a minor or major implementation effort?
Best,
Erik
The text was updated successfully, but these errors were encountered: