Converting a Pandas Dataframe column into one hot labels
Here is an example of using sklearn.preprocessing.LabelBinarizer:
In [361]: from sklearn.preprocessing import LabelBinarizerIn [362]: lb = LabelBinarizer()In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()In [364]: dfOut[364]: Col1 ABC new0 XYZ A [1, 0, 0]1 XYZ B [0, 1, 0]2 XYZ C [0, 0, 1]
Pandas alternative:
In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()In [371]: dfOut[371]: Col1 ABC new0 XYZ A [1, 0, 0]1 XYZ B [0, 1, 0]2 XYZ C [0, 0, 1]
You can just use tolist()
:
df['ABC'] = pd.get_dummies(df.ABC).values.tolist() Col1 ABC0 XYZ [1, 0, 0]1 XYZ [0, 1, 0]2 XYZ [0, 0, 1]
If you have a pd.DataFrame like this:
>>> df Col1 A B C0 XYZ 1 0 01 XYZ 0 1 02 XYZ 0 0 1
You can always do something like this:
>>> df.apply(lambda s: list(s[1:]), axis=1)0 [1, 0, 0]1 [0, 1, 0]2 [0, 0, 1]dtype: object
Note, this is essentially a for-loop on the rows. Note, columns do not have list
data-types, they must be object
, which will make your data-frame operations not able to take advantage of the speed benefits of numpy
.