How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators? How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators? python python

How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators?


From the documentation, you can use either a regex or delim_whitespace:

>>> import pandas as pd>>> for line in open("whitespace.csv"):...     print repr(line)...     'a\t  b\tc 1 2\n''d\t  e\tf 3 4\n'>>> pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")   0  1  2  3  40  a  b  c  1  21  d  e  f  3  4>>> pd.read_csv("whitespace.csv", header=None, delim_whitespace=True)   0  1  2  3  40  a  b  c  1  21  d  e  f  3  4


>>> pd.read_csv("whitespace.csv", header = None, sep = "\s+|\t+|\s+\t+|\t+\s+")

would use any combination of any number of spaces and tabs as the separator.


Pandas has two csv readers, only is flexible regarding redundant leading white space:

pd.read_csv("whitespace.csv", skipinitialspace=True)

while one is not

pd.DataFrame.from_csv("whitespace.csv")

Neither is out-of-the-box flexible regarding trailing white space, see the answers with regular expressions. Avoid delim_whitespace, as it also allows just spaces (without , or \t) as separators.