How can I remove extra whitespace from strings when parsing a csv file in Pandas? How can I remove extra whitespace from strings when parsing a csv file in Pandas? pandas pandas

How can I remove extra whitespace from strings when parsing a csv file in Pandas?


You could use converters:

import pandas as pddef strip(text):    try:        return text.strip()    except AttributeError:        return textdef make_int(text):    return int(text.strip('" '))table = pd.read_table("data.csv", sep=r',',                      names=["Year", "Make", "Model", "Description"],                      converters = {'Description' : strip,                                    'Model' : strip,                                    'Make' : strip,                                    'Year' : make_int})print(table)

yields

   Year     Make   Model              Description0  1997     Ford    E350                     None1  1997     Ford    E350                     None2  1997     Ford    E350   Super, luxurious truck3  1997     Ford    E350  Super "luxurious" truck4  1997     Ford    E350    Super luxurious truck5  1997     Ford    E350                     None6  1997     Ford    E350                     None7  2000  Mercury  Cougar                     None


Adding parameter skipinitialspace=True to read_table worked for me.

So try:

pd.read_table("data.csv",               sep=r',',               names=["Year", "Make", "Model", "Description"],               skipinitialspace=True)

Same thing works in pd.read_csv().


Well, the whitespace is in your data, so you can't read in the data without reading in the whitespace. However, after you've read it in, you could strip out the whitespace by doing, e.g., df["Make"] = df["Make"].map(str.strip) (where df is your dataframe).