Running get_dummies on several DataFrame columns?
With pandas 0.19, you can do that in a single line :
pd.get_dummies(data=df, columns=['A', 'B'])
Columns
specifies where to do the One Hot Encoding.
>>> df A B C0 a c 11 b c 22 a b 3>>> pd.get_dummies(data=df, columns=['A', 'B']) C A_a A_b B_b B_c0 1 1.0 0.0 0.0 1.01 2 0.0 1.0 0.0 1.02 3 1.0 0.0 1.0 0.0
Since pandas version 0.15.0, pd.get_dummies
can handle a DataFrame directly (before that, it could only handle a single Series, and see below for the workaround):
In [1]: df = DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'], ...: 'C': [1, 2, 3]})In [2]: dfOut[2]: A B C0 a c 11 b c 22 a b 3In [3]: pd.get_dummies(df)Out[3]: C A_a A_b B_b B_c0 1 1 0 0 11 2 0 1 0 12 3 1 0 1 0
Workaround for pandas < 0.15.0
You can do it for each column seperate and then concat the results:
In [111]: dfOut[111]: A B0 a x1 a y2 b z3 b x4 c x5 a y6 b y7 c zIn [112]: pd.concat([pd.get_dummies(df[col]) for col in df], axis=1, keys=df.columns)Out[112]: A B a b c x y z0 1 0 0 1 0 01 1 0 0 0 1 02 0 1 0 0 0 13 0 1 0 1 0 04 0 0 1 1 0 05 1 0 0 0 1 06 0 1 0 0 1 07 0 0 1 0 0 1
If you don't want the multi-index column, then remove the keys=..
from the concat function call.
Somebody may have something more clever, but here are two approaches. Assuming you have a dataframe named df
with columns 'Name' and 'Year' you want dummies for.
First, simply iterating over the columns isn't too bad:
In [93]: for column in ['Name', 'Year']: ...: dummies = pd.get_dummies(df[column]) ...: df[dummies.columns] = dummies
Another idea would be to use the patsy package, which is designed to construct data matrices from R-type formulas.
In [94]: patsy.dmatrix(' ~ C(Name) + C(Year)', df, return_type="dataframe")