Skip to content Skip to sidebar Skip to footer

Drop Duplicates With Less Precision

I have a pandas DataFrame with string-columns and float columns I would like to use drop_duplicates to remove duplicates. Some of the duplicates are not exactly the same, because t

Solution 1:

You can use the function round with a given precision in order to round your df.

DataFrame.round(decimals=0, *args, **kwargs)

Round a DataFrame to a variable number of decimal places.

For example you can apply the round with two decimals by this:

df = df.round(2)

Also you can apply it on specific columns, for example:

df = df.round({'result': 2})

After the rounding you can use the function drop_duplictes

Solution 2:

round them

df.loc[df.round().drop_duplicates().index]

     result text
01.000001  aaa
22.000000  aaa
32.000000   bb

Solution 3:

Use numpy.trunc to get at the precision you are looking for. Use pandasduplicated to find which ones to keep.

df[~df.assign(result=np.trunc(df.result.values * 100)).duplicated()]

Post a Comment for "Drop Duplicates With Less Precision"