create dummies from a column for a subset of data, which does't contains all the category value in that column
What you need to do is make the column 'type'
into a pd.Categorical
and specify the categories
pd.get_dummies(pd.Categorical(df.type, [1, 2, 3, 4]), prefix='type') type_1 type_2 type_3 type_40 1 0 0 01 0 0 0 1
Another solution with reindex_axis
and add_prefix
:
df1 = pd.get_dummies(df["type"]) .reindex_axis([1,2,3,4], axis=1, fill_value=0) .add_prefix('type')print (df1) type1 type2 type3 type40 1 0 0 01 0 0 0 1
Or categorical
solution:
df1 = pd.get_dummies(df["type"].astype('category', categories=[1, 2, 3, 4]), prefix='type')print (df1) type_1 type_2 type_3 type_40 1 0 0 01 0 0 0 1
Since you tagged your post as one-hot-encoding
, you may find sklearn
module's OneHotEncoder
useful, in addition to pure Pandas solutions:
import pandas as pdfrom sklearn.preprocessing import OneHotEncoder# sample datadf = pd.DataFrame({'type':[1,4]})n_vals = 5# one-hot encodingencoder = OneHotEncoder(n_values=n_vals, sparse=False, dtype=int)data = encoder.fit_transform(df.type.values.reshape(-1,1))# encoded data framenewdf = pd.DataFrame(data, columns=['type_{}'.format(x) for x in range(n_vals)])print(newdf) type_0 type_1 type_2 type_3 type_40 0 1 0 0 01 0 0 0 0 1
One advantage of using this approach is that OneHotEncoder
easily produces sparse vectors, for very large class sets. (Just change to sparse=True
in the OneHotEncoder()
declaration.)