Skip to content Skip to sidebar Skip to footer

Remove Outliers In Pandas Dataframe With Groupby

I have a dataframe of Report Date, Time Interval and Total Volume for a full year. I would like to be able to remove outliers within each Time Interval. This is as far as I've been

Solution 1:

df[df.groupby("ReportDate").TotalVolume.\transform(lambdax :(x<x.quantile(0.95))&(x>(x.quantile(0.05)))).eq(1)]Out[1033]:ReportDateTimeIntervalTotalVolume5785  2016-03-01            25580.05786  2016-03-01            26716.05787  2016-03-01            27803.0

Solution 2:

One way is to filter out as follows:

In [11]: res = df.groupby("Date")["Interval"].quantile([0.05, 0.95]).unstack(level=1)

In [12]: res
Out[12]:
             0.050.95
Date
2016-03-01489.6913.4

Now we can lookup these values for each row using loc and filter:

In [13]:(res.loc[df.Date,0.05]<df.Interval.values)&(df.Interval.values<res.loc[df.Date,0.95])Out[13]:Date2016-03-01    False2016-03-01     True2016-03-01     True2016-03-01     True2016-03-01    Falsedtype:boolIn [14]:df.loc[((res.loc[df.Date,0.05]<df.Interval.values)&(df.Interval.values<res.loc[df.Date,0.95])).values]Out[14]:ReportDateTimeIntervalTotalVolume15785  2016-03-01    25580.0NaN25786  2016-03-01    26716.0NaN35787  2016-03-01    27803.0NaN

Note: grouping by 'Time Interval' will work the same, but in your example doesn't filter any rows!

Post a Comment for "Remove Outliers In Pandas Dataframe With Groupby"