Converting a Pandas Dataframe column into one hot labels Converting a Pandas Dataframe column into one hot labels pandas pandas

Converting a Pandas Dataframe column into one hot labels


Here is an example of using sklearn.preprocessing.LabelBinarizer:

In [361]: from sklearn.preprocessing import LabelBinarizerIn [362]: lb = LabelBinarizer()In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()In [364]: dfOut[364]:  Col1 ABC        new0  XYZ   A  [1, 0, 0]1  XYZ   B  [0, 1, 0]2  XYZ   C  [0, 0, 1]

Pandas alternative:

In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()In [371]: dfOut[371]:  Col1 ABC        new0  XYZ   A  [1, 0, 0]1  XYZ   B  [0, 1, 0]2  XYZ   C  [0, 0, 1]


You can just use tolist():

df['ABC'] = pd.get_dummies(df.ABC).values.tolist()  Col1        ABC0  XYZ  [1, 0, 0]1  XYZ  [0, 1, 0]2  XYZ  [0, 0, 1]


If you have a pd.DataFrame like this:

>>> df  Col1  A  B  C0  XYZ  1  0  01  XYZ  0  1  02  XYZ  0  0  1

You can always do something like this:

>>> df.apply(lambda s: list(s[1:]), axis=1)0    [1, 0, 0]1    [0, 1, 0]2    [0, 0, 1]dtype: object

Note, this is essentially a for-loop on the rows. Note, columns do not have list data-types, they must be object, which will make your data-frame operations not able to take advantage of the speed benefits of numpy.