How To Create A Bag Of Words From A Pandas Dataframe
Here's my dataframe CATEGORY BRAND 0 Noodle Anak Mas 1 Noodle Anak Mas 2 Noodle Indomie 3 Noodle Indomie 4 Noodle Indomie 23 Noodle Indomie 24 Noodle Mi T
Solution 1:
IIUIC, use
Option 1] Numpy flatten
and split
In [2535]: collections.Counter([y forxin df.values.flatten() foryin x.split()])
Out[2535]:
Counter({'3': 2,
'Anak': 2,
'Cap': 2,
'Indomie': 4,
'Mas': 2,
'Mi': 2,
'Mie': 2,
'Noodle': 10,
'Pop': 2,
'Telor': 2})
Option 2]
Use value_counts()
In [2536]: pd.Series([y for x in df.values.flatten() for y in x.split()]).value_counts()
Out[2536]:
Noodle 10
Indomie 4
Mie 2
Pop 2
Anak 2
Mi 2
Cap 2
Telor 2
Mas 232
dtype: int64
Options 3]
Use stack
and value_counts
In [2582]: df.apply(lambda x: x.str.split(expand=True).stack()).stack().value_counts()
Out[2582]:
Noodle 10
Indomie 4
Mie 2
Pop 2
Anak 2
Mi 2
Cap 2
Telor 2
Mas 232
dtype: int64
Details
In[2516]: dfOut[2516]:
CATEGORYBRAND0NoodleAnakMas1NoodleAnakMas2NoodleIndomie3NoodleIndomie4NoodleIndomie23NoodleIndomie24NoodleMiTelorCap325NoodleMiTelorCap326NoodlePopMie27NoodlePopMie
Post a Comment for "How To Create A Bag Of Words From A Pandas Dataframe"