Pandas Count Elements In A Columns And Show In Duplicated Way

July 28, 2023 Post a Comment

I want to get something like this. A 1 1 2 3 3 4 4 4 4 I want to make it to be A B 1 2 1 2 2 1 3 2 3 2 4 4 4 4 4 4 4 4 Like you see here, the keys are duplic

Solution 1:

You can use this:

import pandas as pd

df = pd.DataFrame({
    'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]
})
df['B'] = df.groupby(['A'])['A'].transform('count')

print(df)

output:

AB012112221332432544644744844

Solution 2:

You could use a groupby and merge:

df = pd.DataFrame({'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]})

df = df.merge(df.groupby('A').size().reset_index(), on='A')

Which will give you:

A0012112221332432544644744844

Solution 3:

Fast way using pd.factorize and np.bincount

f = df.A.factorize()[0]
df.assign(B=np.bincount(f)[f])

   A  B
0  1  2
1  1  2
2  2  1
3  3  2
4  3  2
5  4  4
6  4  4
7  4  4
8  4  4

Explanation

pd.factorize will create an array of integers where each integer represents a unique value in the factorized array. These integers start from zero.

f

array([0, 0, 1, 2, 2, 3, 3, 3, 3])

np.bincount will use each value in an array of integers and count how many times that integer has been seen. If we think of these integers as bins, then we are counting how many times each bin is referenced.

np.bincount(f)

array([2, 1, 2, 4])

Finally, we use f to slice these counts to give us back the counts repeated for each time the bin was referenced.

np.bincount(f)[f]array([2, 2, 1, 2, 2, 4, 4, 4, 4])

Solution 4:

Using map with groupbysize

df['B']=df.A.map(df.groupby('A').size())
df
Out[630]: 
   A  B
0  1  2
1  1  2
2  2  1
3  3  2
4  3  2
5  4  4
6  4  4
7  4  4
8  4  4

Python Courses, Training, and Tutorials