Python Pandas Error tokenizing data Python Pandas Error tokenizing data python python

Python Pandas Error tokenizing data

you could also try;

data = pd.read_csv('file1.csv', error_bad_lines=False)

Do note that this will cause the offending lines to be skipped.

It might be an issue with

  • the delimiters in your data
  • the first row, as @TomAugspurger noted

To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,

df = pandas.read_csv(filepath, sep='delimiter', header=None)

In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.

According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.

The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.

Try it with data = pd.read_csv(path, skiprows=2)