Pandas Count Elements In A Columns And Show In Duplicated Way
I want to get something like this. A 1 1 2 3 3 4 4 4 4 I want to make it to be A B 1 2 1 2 2 1 3 2 3 2 4 4 4 4 4 4 4 4 Like you see here, the keys are duplic
Solution 1:
You can use this:
import pandas as pd
df = pd.DataFrame({
'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]
})
df['B'] = df.groupby(['A'])['A'].transform('count')
print(df)
output:
AB012112221332432544644744844
Solution 2:
You could use a groupby and merge:
df = pd.DataFrame({'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]})
df = df.merge(df.groupby('A').size().reset_index(), on='A')
Which will give you:
A0012112221332432544644744844
Solution 3:
Fast way using pd.factorize
and np.bincount
f = df.A.factorize()[0]
df.assign(B=np.bincount(f)[f])
A B
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
Explanation
pd.factorize
will create an array of integers where each integer represents a unique value in the factorized array. These integers start from zero.
f
array([0, 0, 1, 2, 2, 3, 3, 3, 3])
np.bincount
will use each value in an array of integers and count how many times that integer has been seen. If we think of these integers as bins, then we are counting how many times each bin is referenced.
np.bincount(f)
array([2, 1, 2, 4])
Finally, we use f
to slice these counts to give us back the counts repeated for each time the bin was referenced.
np.bincount(f)[f]array([2, 2, 1, 2, 2, 4, 4, 4, 4])
Solution 4:
Using map
with groupby
size
df['B']=df.A.map(df.groupby('A').size())
df
Out[630]:
A B
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
Post a Comment for "Pandas Count Elements In A Columns And Show In Duplicated Way"