Skip to content Skip to sidebar Skip to footer

How To Create A Bag Of Words From A Pandas Dataframe

Here's my dataframe CATEGORY BRAND 0 Noodle Anak Mas 1 Noodle Anak Mas 2 Noodle Indomie 3 Noodle Indomie 4 Noodle Indomie 23 Noodle Indomie 24 Noodle Mi T

Solution 1:

IIUIC, use

Option 1] Numpy flatten and split

In [2535]: collections.Counter([y forxin df.values.flatten() foryin x.split()])
Out[2535]:
Counter({'3': 2,
         'Anak': 2,
         'Cap': 2,
         'Indomie': 4,
         'Mas': 2,
         'Mi': 2,
         'Mie': 2,
         'Noodle': 10,
         'Pop': 2,
         'Telor': 2})

Option 2] Use value_counts()

In [2536]: pd.Series([y for x in df.values.flatten() for y in x.split()]).value_counts()
Out[2536]:
Noodle     10
Indomie     4
Mie         2
Pop         2
Anak        2
Mi          2
Cap         2
Telor       2
Mas         232
dtype: int64

Options 3] Use stack and value_counts

In [2582]: df.apply(lambda x: x.str.split(expand=True).stack()).stack().value_counts()
Out[2582]:
Noodle     10
Indomie     4
Mie         2
Pop         2
Anak        2
Mi          2
Cap         2
Telor       2
Mas         232
dtype: int64

Details

In[2516]: dfOut[2516]:
   CATEGORYBRAND0NoodleAnakMas1NoodleAnakMas2NoodleIndomie3NoodleIndomie4NoodleIndomie23NoodleIndomie24NoodleMiTelorCap325NoodleMiTelorCap326NoodlePopMie27NoodlePopMie

Post a Comment for "How To Create A Bag Of Words From A Pandas Dataframe"