No Output With .duplicated In Pandas?
Solution 1:
dl_by_loc(path)
returns a Series with a MultiIndex:
round_latituderound_longitudecity30.0-95.0Houston140.0-75.0Philadelphia3Name:downloads,dtype:int64
If you take a look at the definition of that function, it groups the DataFrame by round_latitude, round_longitude and city columns and counts the number of occurrences. Later on, you convert this to a DataFrame by calling reset_index(). Now, the downloads column is showing how many times each lat, lon, city combination occurred in the original DataFrame. Since it is a groupby result, these combinations are in fact not duplicated because they were aggregated previously. If you want to detect duplicated ones from this DataFrame, you can use:
by_coords[by_coords['downloads']>1]
Your method would still work in the original DataFrame. Note that removing duplicates or grouping data with float type data has some risks. Pandas generally handles them but to make sure, if you want 1-digit precision, you can multiply by 10 and convert to integer.
Post a Comment for "No Output With .duplicated In Pandas?"