What's the difference between dtype and converters in pandas.read_csv?

python pandas types converter type-inference

The semantic difference is that dtype allows you to specify how to treat the values, for example, either as numeric or string type.

Converters allows you to parse your input data to convert it to a desired dtype using a conversion function, e.g, parsing a string value to datetime or to some other desired dtype.

Here we see that pandas tries to sniff the types:

In [2]:df = pd.read_csv(io.StringIO(t))t="""int,float,date,str001,3.31,2015/01/01,005"""df = pd.read_csv(io.StringIO(t))df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1 entries, 0 to 0Data columns (total 4 columns):int      1 non-null int64float    1 non-null float64date     1 non-null objectstr      1 non-null int64dtypes: float64(1), int64(2), object(1)memory usage: 40.0+ bytes

You can see from the above that 001 and 005 are treated as int64 but the date string stays as str.

If we say everything is object then essentially everything is str:

In [3]:    df = pd.read_csv(io.StringIO(t), dtype=object).info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1 entries, 0 to 0Data columns (total 4 columns):int      1 non-null objectfloat    1 non-null objectdate     1 non-null objectstr      1 non-null objectdtypes: object(4)memory usage: 40.0+ bytes

Here we force the int column to str and tell parse_dates to use the date_parser to parse the date column:

In [6]:pd.read_csv(io.StringIO(t), dtype={'int':'object'}, parse_dates=['date']).info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1 entries, 0 to 0Data columns (total 4 columns):int      1 non-null objectfloat    1 non-null float64date     1 non-null datetime64[ns]str      1 non-null int64dtypes: datetime64[ns](1), float64(1), int64(1), object(1)memory usage: 40.0+ bytes

Similarly we could've pass the to_datetime function to convert the dates:

In [5]:pd.read_csv(io.StringIO(t), converters={'date':pd.to_datetime}).info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1 entries, 0 to 0Data columns (total 4 columns):int      1 non-null int64float    1 non-null float64date     1 non-null datetime64[ns]str      1 non-null int64dtypes: datetime64[ns](1), float64(1), int64(2)memory usage: 40.0 bytes

python pandas types converter type-inference

I would say that the main purpose for converters is to manipulate the values of the column, not the datatype. The answer shared by @EdChum focuses on the idea of the dtypes. It uses the pd.to_datetime function.

Within this article https://medium.com/analytics-vidhya/make-the-most-out-of-your-pandas-read-csv-1531c71893b5 in the area about converters, you will see an example of changing a csv column, with values such as "185 lbs.", into something that removes the "lbs" from the text column. This is more of the idea behind the read_csv converters parameter.

What the .csv looks like (If the image doesn't show up, please go to the article.)

#creating functions to clean the columnsw = lambda x: (x.replace('lbs.',''))r = lambda x: (x.replace('"',''))#using converters to apply the functions to the columnsfighter = pd.read_csv('raw_fighter_details.csv' ,                       converters={'Weight':w , 'Reach':r },                       header=0,                       usecols = [0,1,2,3])fighter.head(15)

The DataFrame after using converters on the Weight column.

CodeHunter

What's the difference between dtype and converters in pandas.read_csv?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last