No Output With .duplicated In Pandas?

October 06, 2023 Post a Comment

I want to find all of the rows which have duplicates in the columns of city, round_latitude, and round_longitude. So, if two rows share the same values in each of those columns, it

Solution 1:

dl_by_loc(path) returns a Series with a MultiIndex:

round_latituderound_longitudecity30.0-95.0Houston140.0-75.0Philadelphia3Name:downloads,dtype:int64

If you take a look at the definition of that function, it groups the DataFrame by round_latitude, round_longitude and city columns and counts the number of occurrences. Later on, you convert this to a DataFrame by calling reset_index(). Now, the downloads column is showing how many times each lat, lon, city combination occurred in the original DataFrame. Since it is a groupby result, these combinations are in fact not duplicated because they were aggregated previously. If you want to detect duplicated ones from this DataFrame, you can use:

by_coords[by_coords['downloads']>1]

Your method would still work in the original DataFrame. Note that removing duplicates or grouping data with float type data has some risks. Pandas generally handles them but to make sure, if you want 1-digit precision, you can multiply by 10 and convert to integer.

Python Courses, Training, and Tutorials

No Output With .duplicated In Pandas?

Solution 1:

Post a Comment for "No Output With .duplicated In Pandas?"