Skip to content

read_csv "CParserError: Error tokenizing data" with variable number of fields #11493

@ekinsenturk

Description

@ekinsenturk

I am having trouble with read_csv (Pandas 0.17.0) when trying to read a 380+ MB csv file. The file starts with 54 fields but some lines have 53 fields instead of 54. Running the below code gives me the following error:

parser = lambda x: datetime.strptime(x, '%y %m %d %H %M %S %f')
df = pd.read_csv(filename,
                         names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                        usecols=range(0, 42),
                        parse_dates={"TIMESTAMP": [0, 1, 2, 3, 4, 5, 6]},
                        date_parser=parser,
                        header=None)

Error:

CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54

If I pass the error_bad_lines=False keyword, problematic lines are displayed similar to the example below:

Skipping line 1683401: expected 53 fields, saw 54

however I get the following error this time ( also the DataFrame does not get loaded):

CParserError: Too many columns specified: expected 54 and found 53

If I pass the engine='python' keyword, I do not get any errors, but it takes a really long time to parse the data. Please note that 53 and 54 are switched in the error messages depending on if error_bad_lines=False is used or not.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions