read_csv "CParserError: Error tokenizing data" with variable number of fields

I am having trouble with `read_csv` (Pandas 0.17.0) when trying to read a 380+ MB csv file. The file starts with 54 fields but some lines have 53 fields instead of 54. Running the below code gives me the following error:

```
parser = lambda x: datetime.strptime(x, '%y %m %d %H %M %S %f')
df = pd.read_csv(filename,
                         names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                        usecols=range(0, 42),
                        parse_dates={"TIMESTAMP": [0, 1, 2, 3, 4, 5, 6]},
                        date_parser=parser,
                        header=None)
```

Error:

```
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
```

If I pass the `error_bad_lines=False` keyword, problematic lines are displayed similar to the example below:

```
Skipping line 1683401: expected 53 fields, saw 54
```

however I get the following error this time ( also the DataFrame does not get loaded):

```
CParserError: Too many columns specified: expected 54 and found 53
```

If I pass the `engine='python'` keyword, I do not get any errors, but it takes a really long time to parse the data. Please note that 53 and 54 are switched in the error messages depending on if `error_bad_lines=False` is used or not.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_csv "CParserError: Error tokenizing data" with variable number of fields #11493

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_csv "CParserError: Error tokenizing data" with variable number of fields #11493

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions