Find Repeated Words In A Column And Sort It According To Number Of Occurence Using Pandas
A B 1) Italy Transport for London..... 2) Italy Roseanne Barr Actor leavin..... 3) America
Solution 1:
Use transform
with argsort in descending order
for positions and select by iloc
:
df = df.iloc[(-df.groupby('A')['A'].transform('size')).argsort()]
print (df)
A B
3) America Americas Transport forLondon4) America Transport forLondon5) America Roseanne Barr Actor leavin
1) Italy Transport forLondon2) Italy Roseanne Barr Actor leavin
6) France Americas Transport forLondon
Or create new column and sort:
df['new'] = df.groupby('A')['A'].transform('size')
df = df.sort_values('new', ascending=False)
print (df)
A B new
3) America Americas Transport for London 3
4) America Transport for London 3
5) America Roseanne Barr Actor leavin 3
1) Italy Transport for London 2
2) Italy Roseanne Barr Actor leavin 2
6) France Americas Transport for London 1
Solution 2:
Using collections.Counter
to create a dictionary of counts:
from collections import Counter
df = pd.DataFrame([['Italy', 'Transport for London'],
['Italy', 'Roseanne Barr Actor leavin'],
['America', 'Americas Transport for London'],
['America', 'Transport for London'],
['America', 'Roseanne Barr Actor leavin'],
['France', 'Americas Transport for London']],
columns=['A', 'B'])
# calculate counts
c = Counter(df['A'])
# apply reorderingdf = df.iloc[df['A'].map(c).argsort()[::-1]]
# save to excel
df.to_excel('file.xlsx', index=False)
Result:
print(df)
A B
4 America Roseanne Barr Actor leavin
3 America Transport forLondon2 America Americas Transport forLondon1 Italy Roseanne Barr Actor leavin
0 Italy Transport forLondon5 France Americas Transport forLondon
Post a Comment for "Find Repeated Words In A Column And Sort It According To Number Of Occurence Using Pandas"