New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple and/or multi-character and/or regex comment_char
in read_csv()
#10583
Comments
comment_char
and/or multi-character comment_char
in read_csv()
comment_char
in read_csv()
For your specific use case, I would recommend setting Supporting multiple chars / non-ASCII chars would be nice (for |
@stinodego I have refused to add this on many occassions and I really don't think we should add this. This would have very large performance impacts which I don't think are worth it. I want the csv-parser to be performant and close to a formal csv format as possible. Multiple character and worse regex delimiters will have very negative performance impacts. |
I would only accept mutli-char comments as this can be implemented cheaply. |
Honestly, I'd be inclined to agree - regex doesn't seem that much more useful and could have a large performance impact unless it's implemented as an entirely separate code path. @ritchie46 would you be inclined to accept #12519 for the multi-char comments? |
Closing as completed via #12519, thanks everyone! |
Problem description
Multiple
comment_char
meaning e.g.('#', '%')
both start comments. Multi-character meaning e.g.//
starts a comment, like in C++.In particular, it would be very nice to support
comment_char='##'
for VCF files, one of the most common file formats in computational biology. In VCF files, the first few lines are metadata starting with a##
(and should be excluded), but the header line starts with a single#
, socomment_char='#'
would erroneously exclude the header.Multi-character comments were requested in pandas, but the feature request (which was originally about multiple rather than multi-character comments) was closed for being "difficult" to implement. I'm sure it would be no problem for the polars team though :)
The text was updated successfully, but these errors were encountered: