Pandas cast all object columns to category Pandas cast all object columns to category pandas pandas

Pandas cast all object columns to category


use apply and pd.Series.astype with dtype='category'

Consider the pd.DataFrame df

df = pd.DataFrame(dict(        A=[1, 2, 3, 4],        B=list('abcd'),        C=[2, 3, 4, 5],        D=list('defg')    ))df

enter image description here

df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 4 entries, 0 to 3Data columns (total 4 columns):A    4 non-null int64B    4 non-null objectC    4 non-null int64D    4 non-null objectdtypes: int64(2), object(2)memory usage: 200.0+ bytes

Lets use select_dtypes to include all 'object' types to convert and recombine with a select_dtypes to exclude them.

df = pd.concat([        df.select_dtypes([], ['object']),        df.select_dtypes(['object']).apply(pd.Series.astype, dtype='category')        ], axis=1).reindex_axis(df.columns, axis=1)df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 4 entries, 0 to 3Data columns (total 4 columns):A    4 non-null int64B    4 non-null categoryC    4 non-null int64D    4 non-null categorydtypes: category(2), int64(2)memory usage: 208.0 bytes


I think that this is a more elegant way:

df = pd.DataFrame(dict(        A=[1, 2, 3, 4],        B=list('abcd'),        C=[2, 3, 4, 5],        D=list('defg')    ))df.info()df.loc[:, df.dtypes == 'object'] =\    df.select_dtypes(['object'])\    .apply(lambda x: x.astype('category'))df.info()


Wish I could add this as a comment, but can't.

The accepted answer doesn't work for pandas version 0.25 and higher. Use .reindex instead of reindex_axis. See here for more information:https://github.com/scikit-hep/root_pandas/issues/82