How to set dtypes by column in pandas DataFrame How to set dtypes by column in pandas DataFrame pandas pandas

How to set dtypes by column in pandas DataFrame


I just ran into this, and the pandas issue is still open, so I'm posting my workaround. Assuming df is my DataFrame and dtype is a dict mapping column names to types:

for k, v in dtype.items():    df[k] = df[k].astype(v)

(note: use dtype.iteritems() in python 2)

For the reference:


You may want to try passing in a dictionary of Series objects to the DataFrame constructor - it will give you much more specific control over the creation, and should hopefully be clearer what's going on. A template version (data1 can be an array etc.):

df = pd.DataFrame({'column1':pd.Series(data1, dtype='type1'),                   'column2':pd.Series(data2, dtype='type2')})

And example with data:

df = pd.DataFrame({'A':pd.Series([1,2,3], dtype='int'),                   'B':pd.Series([7,8,9], dtype='float')})print (df)   A  B0  1  7.01  2  8.02  3  9.0print (df.dtypes)A     int32B    float64dtype: object


As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:

dtype : dtype, default None    Data type to force. Only a single dtype is allowed. If None, infer

However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:

>>> myarray = np.random.randint(0,5,size=(2,2))>>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)])>>> mydf = pd.DataFrame.from_records(record)>>> mydf.dtypesa    float64b      int64dtype: object