Skip to content Skip to sidebar Skip to footer

Find Repeated Words In A Column And Sort It According To Number Of Occurence Using Pandas

A B 1) Italy Transport for London..... 2) Italy Roseanne Barr Actor leavin..... 3) America

Solution 1:

Use transform with argsort in descending order for positions and select by iloc:

df = df.iloc[(-df.groupby('A')['A'].transform('size')).argsort()]
print (df)
          A                              B
3)  America  Americas Transport forLondon4)  America           Transport forLondon5)  America     Roseanne Barr Actor leavin
1)    Italy           Transport forLondon2)    Italy     Roseanne Barr Actor leavin
6)   France  Americas Transport forLondon

Or create new column and sort:

df['new'] = df.groupby('A')['A'].transform('size')

df = df.sort_values('new', ascending=False)
print (df)
          A                              B  new
3)  America  Americas Transport for London    3
4)  America           Transport for London    3
5)  America     Roseanne Barr Actor leavin    3
1)    Italy           Transport for London    2
2)    Italy     Roseanne Barr Actor leavin    2
6)   France  Americas Transport for London    1

Solution 2:

Using collections.Counter to create a dictionary of counts:

from collections import Counter

df = pd.DataFrame([['Italy', 'Transport for London'],
                   ['Italy', 'Roseanne Barr Actor leavin'],
                   ['America', 'Americas Transport for London'],
                   ['America', 'Transport for London'],
                   ['America', 'Roseanne Barr Actor leavin'],
                   ['France', 'Americas Transport for London']],
                  columns=['A', 'B'])

# calculate counts
c = Counter(df['A'])

# apply reorderingdf = df.iloc[df['A'].map(c).argsort()[::-1]]

# save to excel
df.to_excel('file.xlsx', index=False)

Result:

print(df)

         A                              B
4  America     Roseanne Barr Actor leavin
3  America           Transport forLondon2  America  Americas Transport forLondon1    Italy     Roseanne Barr Actor leavin
0    Italy           Transport forLondon5   France  Americas Transport forLondon

Post a Comment for "Find Repeated Words In A Column And Sort It According To Number Of Occurence Using Pandas"