Skip to content Skip to sidebar Skip to footer

Converting A Pandas Dataframe Column Into One Hot Labels

I have a pandas dataframe similar to this: Col1 ABC 0 XYZ A 1 XYZ B 2 XYZ C By using the pandas get_dummies() function on column ABC, I can get this: Col1 A

Solution 1:

Here is an example of using sklearn.preprocessing.LabelBinarizer:

In [361]: from sklearn.preprocessing import LabelBinarizer

In [362]: lb = LabelBinarizer()

In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()

In [364]: df
Out[364]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Pandas alternative:

In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()

In [371]: df
Out[371]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Solution 2:

You can just use tolist():

df['ABC'] = pd.get_dummies(df.ABC).values.tolist()

  Col1        ABC
0  XYZ  [1, 0, 0]
1  XYZ  [0, 1, 0]
2  XYZ  [0, 0, 1]

Solution 3:

If you have a pd.DataFrame like this:

>>> df
  Col1  AB  C
0  XYZ  1001  XYZ  0102  XYZ  001

You can always do something like this:

>>> df.apply(lambda s: list(s[1:]), axis=1)
0[1, 0, 0]1[0, 1, 0]2[0, 0, 1]dtype: object

Note, this is essentially a for-loop on the rows. Note, columns do not have list data-types, they must be object, which will make your data-frame operations not able to take advantage of the speed benefits of numpy.

Solution 4:

if you have a data-frame df with categorical column ABC then you could use to create a new column of one-hot vectors

df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())

Post a Comment for "Converting A Pandas Dataframe Column Into One Hot Labels"