Using commented header line (flat file readers) #20378

Socob · 2018-03-16T12:44:43Z

Currently, it is not possible to both ignore comments and use a commented header when reading a CSV file. From the documentation for the header argument of read_table etc.:

Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

It would be great if one could specify that header should not skip commented lines so that a header can be used even if it happens to contain the comment character.

Other people requesting this:

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-03-16T14:02:53Z

Can you show an actual example? The two you linked to sound different than" Using a commended header line."

Socob · 2018-03-16T14:19:18Z

Sure, just take the example from the Stack Overflow link:

import pandas as pd
from io import StringIO
s = '#one two three\n1 2 3'
pd.read_csv(StringIO(s), delim_whitespace=True, comment='#')

Output:

Empty DataFrame
Columns: [1 2 3]
Index: []

Desired: Instead of the second line (1 2 3), the first line with the comment (#one two three) should be used as the header. The second line should be interpreted as data.

TomAugspurger · 2018-03-16T14:40:03Z

Thanks.

FWIW, I think that

f = StringIO(s)
header = f.readline().rstrip().strip("#").split(" ")  # use csv to make more robust
df = pd.read_csv(f, names=header)

To be pretty clear.

How would your proposal interact with the other keywords that deal with position, like header, skiprows, etc?

Would this require a new keyword to preserve backwards compatibility? As you've written it, it's backwards incompatible, and we're hesitant to add new keywords to the already long read_csv signature, especially when the workaround is relatively straightforward.

Socob · 2018-03-16T16:45:32Z

Of course, I don’t intend to break backwards compatibility. To cover all cases, a new keyword would probably be necessary, yes. The workaround is not so straightforward if the header is not the very first line, but I suppose any cases where that’s necessary are pretty obscure.

If the desire against new keywords outweighs the benefits of simplifying this use case, I’d be willing to close this.

TomAugspurger added IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue labels Mar 16, 2018

Socob closed this as completed Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using commented header line (flat file readers) #20378

Using commented header line (flat file readers) #20378

Socob commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

Socob commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

Socob commented Mar 16, 2018

Using commented header line (flat file readers) #20378

Using commented header line (flat file readers) #20378

Comments

Socob commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

Socob commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

Socob commented Mar 16, 2018