create dummies from a column for a subset of data, which does't contains all the category value in that column create dummies from a column for a subset of data, which does't contains all the category value in that column pandas pandas

create dummies from a column for a subset of data, which does't contains all the category value in that column


What you need to do is make the column 'type' into a pd.Categorical and specify the categories

pd.get_dummies(pd.Categorical(df.type, [1, 2, 3, 4]), prefix='type')   type_1  type_2  type_3  type_40       1       0       0       01       0       0       0       1


Another solution with reindex_axis and add_prefix:

df1 = pd.get_dummies(df["type"])        .reindex_axis([1,2,3,4], axis=1, fill_value=0)        .add_prefix('type')print (df1)   type1  type2  type3  type40      1      0      0      01      0      0      0      1

Or categorical solution:

df1 = pd.get_dummies(df["type"].astype('category', categories=[1, 2, 3, 4]), prefix='type')print (df1)   type_1  type_2  type_3  type_40       1       0       0       01       0       0       0       1


Since you tagged your post as one-hot-encoding, you may find sklearn module's OneHotEncoder useful, in addition to pure Pandas solutions:

import pandas as pdfrom sklearn.preprocessing import OneHotEncoder# sample datadf = pd.DataFrame({'type':[1,4]})n_vals = 5# one-hot encodingencoder = OneHotEncoder(n_values=n_vals, sparse=False, dtype=int)data = encoder.fit_transform(df.type.values.reshape(-1,1))# encoded data framenewdf = pd.DataFrame(data, columns=['type_{}'.format(x) for x in range(n_vals)])print(newdf)   type_0  type_1  type_2  type_3  type_40       0       1       0       0       01       0       0       0       0       1

One advantage of using this approach is that OneHotEncoder easily produces sparse vectors, for very large class sets. (Just change to sparse=True in the OneHotEncoder() declaration.)