Drop Duplicates With Less Precision
I have a pandas DataFrame with string-columns and float columns I would like to use drop_duplicates to remove duplicates. Some of the duplicates are not exactly the same, because t
Solution 1:
You can use the function round
with a given precision in order to round your df.
DataFrame.round(decimals=0, *args, **kwargs)
Round a DataFrame to a variable number of decimal places.
For example you can apply the round with two decimals by this:
df = df.round(2)
Also you can apply it on specific columns, for example:
df = df.round({'result': 2})
After the rounding you can use the function drop_duplictes
Solution 2:
round them
df.loc[df.round().drop_duplicates().index]
result text
01.000001 aaa
22.000000 aaa
32.000000 bb
Solution 3:
Use numpy.trunc
to get at the precision you are looking for. Use pandas
duplicated
to find which ones to keep.
df[~df.assign(result=np.trunc(df.result.values * 100)).duplicated()]
Post a Comment for "Drop Duplicates With Less Precision"