How To Find Duplicates In Pandas Dataframe
Editing. Suppose I have the following series in pandas: >>>p 0 0.0 1 0.0 2 0.0 3 0.3 4 0.3 5 0.3 6 0.3 7 0.3 8 1.0 9 1.0 10 1.
Solution 1:
I believe need trick with compare shift
ed values for not equal by ne
with cumsum
and last drop_duplicates
:
s = df['a'].ne(df['a'].shift()).cumsum()
a = s.drop_duplicates().index
b = s.drop_duplicates(keep='last').index
df = pd.DataFrame({'first':a, 'last':b})
print (df)
first last
0 0 2
1 3 7
2 8 10
3 11 12
4 13 15
If want also duplicated value to new column a bit change solution with duplicated
:
s = df['a'].ne(df['a'].shift()).cumsum()
a = df.loc[~s.duplicated(), 'a']
b = s.drop_duplicates(keep='last')
df = pd.DataFrame({'first':a.index, 'last':b.index, 'val':a})
print (df)
first last val
0 0 2 0.0
3 3 7 0.3
8 8 10 1.0
11 11 12 0.2
13 13 15 0.3
If need new column:
df['count'] = df['a'].ne(df['a'].shift()).cumsum()
print (df)
a count
0 0.0 1
1 0.0 1
2 0.0 1
3 0.3 2
4 0.3 2
5 0.3 2
6 0.3 2
7 0.3 2
8 1.0 3
9 1.0 3
10 1.0 3
11 0.2 4
12 0.2 4
13 0.3 5
14 0.3 5
15 0.3 5
Post a Comment for "How To Find Duplicates In Pandas Dataframe"