Remove Outliers In Pandas Dataframe With Groupby
I have a dataframe of Report Date, Time Interval and Total Volume for a full year. I would like to be able to remove outliers within each Time Interval. This is as far as I've been
Solution 1:
df[df.groupby("ReportDate").TotalVolume.\transform(lambdax :(x<x.quantile(0.95))&(x>(x.quantile(0.05)))).eq(1)]Out[1033]:ReportDateTimeIntervalTotalVolume5785 2016-03-01 25580.05786 2016-03-01 26716.05787 2016-03-01 27803.0
Solution 2:
One way is to filter out as follows:
In [11]: res = df.groupby("Date")["Interval"].quantile([0.05, 0.95]).unstack(level=1)
In [12]: res
Out[12]:
0.050.95
Date
2016-03-01489.6913.4
Now we can lookup these values for each row using loc
and filter:
In [13]:(res.loc[df.Date,0.05]<df.Interval.values)&(df.Interval.values<res.loc[df.Date,0.95])Out[13]:Date2016-03-01 False2016-03-01 True2016-03-01 True2016-03-01 True2016-03-01 Falsedtype:boolIn [14]:df.loc[((res.loc[df.Date,0.05]<df.Interval.values)&(df.Interval.values<res.loc[df.Date,0.95])).values]Out[14]:ReportDateTimeIntervalTotalVolume15785 2016-03-01 25580.0NaN25786 2016-03-01 26716.0NaN35787 2016-03-01 27803.0NaN
Note: grouping by 'Time Interval' will work the same, but in your example doesn't filter any rows!
Post a Comment for "Remove Outliers In Pandas Dataframe With Groupby"