You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, it is not possible to both ignore comments and use a commented header when reading a CSV file. From the documentation for the header argument of read_table etc.:
Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
It would be great if one could specify that header should not skip commented lines so that a header can be used even if it happens to contain the comment character.
Sure, just take the example from the Stack Overflow link:
importpandasaspdfromioimportStringIOs='#one two three\n1 2 3'pd.read_csv(StringIO(s), delim_whitespace=True, comment='#')
Output:
Empty DataFrame
Columns: [1 2 3]
Index: []
Desired: Instead of the second line (1 2 3), the first line with the comment (#one two three) should be used as the header. The second line should be interpreted as data.
f = StringIO(s)
header = f.readline().rstrip().strip("#").split(" ") # use csv to make more robust
df = pd.read_csv(f, names=header)
To be pretty clear.
How would your proposal interact with the other keywords that deal with position, like header, skiprows, etc?
Would this require a new keyword to preserve backwards compatibility? As you've written it, it's backwards incompatible, and we're hesitant to add new keywords to the already long read_csv signature, especially when the workaround is relatively straightforward.
Of course, I don’t intend to break backwards compatibility. To cover all cases, a new keyword would probably be necessary, yes. The workaround is not so straightforward if the header is not the very first line, but I suppose any cases where that’s necessary are pretty obscure.
If the desire against new keywords outweighs the benefits of simplifying this use case, I’d be willing to close this.
Currently, it is not possible to both ignore comments and use a commented header when reading a CSV file. From the documentation for the
header
argument ofread_table
etc.:It would be great if one could specify that
header
should not skip commented lines so that a header can be used even if it happens to contain the comment character.Other people requesting this:
The text was updated successfully, but these errors were encountered: