Converting A Pandas Dataframe Column Into One Hot Labels
I have a pandas dataframe similar to this: Col1 ABC 0 XYZ A 1 XYZ B 2 XYZ C By using the pandas get_dummies() function on column ABC, I can get this: Col1 A
Solution 1:
Here is an example of using sklearn.preprocessing.LabelBinarizer:
In [361]: from sklearn.preprocessing import LabelBinarizer
In [362]: lb = LabelBinarizer()
In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()
In [364]: df
Out[364]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]
Pandas alternative:
In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()
In [371]: df
Out[371]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]
Solution 2:
You can just use tolist()
:
df['ABC'] = pd.get_dummies(df.ABC).values.tolist()
Col1 ABC
0 XYZ [1, 0, 0]
1 XYZ [0, 1, 0]
2 XYZ [0, 0, 1]
Solution 3:
If you have a pd.DataFrame like this:
>>> df
Col1 AB C
0 XYZ 1001 XYZ 0102 XYZ 001
You can always do something like this:
>>> df.apply(lambda s: list(s[1:]), axis=1)
0[1, 0, 0]1[0, 1, 0]2[0, 0, 1]dtype: object
Note, this is essentially a for-loop on the rows. Note, columns do not have list
data-types, they must be object
, which will make your data-frame operations not able to take advantage of the speed benefits of numpy
.
Solution 4:
if you have a data-frame df
with categorical column ABC
then you could use to create a new column of one-hot vectors
df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())
Post a Comment for "Converting A Pandas Dataframe Column Into One Hot Labels"