Create A Stacked Bar Chart Of The N Largest Columns Per Row In A Dataframe
I have a dataframe of N columns of values by M dates. I'm looking to plot a stacked bar chart of the 3 largest values per date. Test dataframe: import pandas import numpy data = {
Solution 1:
Starting with your input df
:
top3_by_date = (
# bring the date back as a columnto use as a grouping var
df.reset_index()
# make a long DF ofdate/column/name value
.melt(id_vars='index')
# order DF by highest valuesfirst
.sort_values('value', ascending=False)
# groupby the index and take the first3rowsofeach
.groupby('index')
.head(3)
# pivot back so we've got an X & Y to chart...
.pivot('index', 'variable')
# drop the value level as we don't need that
.droplevel(level=0, axis=1)
)
This gives:
variable A B C D
index
2018-01-0165.054.034.0NaN2018-01-0254.047.039.0NaN2018-01-03NaN60.057.047.02018-01-04NaN34.056.047.02018-01-05NaN40.048.035.02018-01-06NaN35.0NaN70.0
Then you can do top3_by_date.plot.bar(stacked=True)
, which should give you something similar to:
Solution 2:
It is possible, but somewhat convoluted, since you need to use bottom
to offset each bar above those at the same date with lower values. This prevents bars with higher values hiding bars with lower values.
For each column (representing one series in the bar chart), 3 arrays are required:
dates
: the dates which have values for this column (ie: the dates for which this column is one of the 3 largest values)values
: the difference between this value and the next lower valuebottoms
: the value of the next lower value
Building up the arrays:
col_dates = collections.defaultdict(list)
col_values = collections.defaultdict(list)
col_bottoms = collections.defaultdict(list)
for date,row in df.iterrows():
top = row.nlargest(3)
for i,kv in enumerate(top.iteritems()):
col, val = kv
next_val = top.values[i+1] if i+1 < len(top.values) else0
col_dates [col].append(date)
col_values [col].append(val - next_val)
col_bottoms[col].append(next_val)
Plotting the bar chart:
fig = pyplot.figure(figsize=(20,10))
ax = fig.add_subplot(1,1,1)
forcol,vals in col_values.items():
dates = col_dates[col]
bottoms = col_bottoms[col]
ax.bar(matplotlib.dates.date2num(dates), vals, width=.6, bottom=bottoms, label=col)
ax.xaxis_date()
ax.legend(loc='best', fontsize='large')
pyplot.show()
The resulting plot:
Solution 3:
You can do this with a simple apply
. It will not be vectorized but I think it's much clearer to read. In this case I filled NaN
with -np.inf
because sort doesn't work well with NaN
values.
import pandas as pd
import numpy as np
data = {
'A': [ 65, 54, 12, 14, 30, np.nan ],
'B': [ 54, 47, 60, 34, 40, 35 ],
'C': [ 34, 39, 57, 56, 48, np.nan ],
'D': [ 20, 18, 47, 47, 35, 70 ]
}
df = pd.DataFrame(index=pd.date_range('2018-01-01', '2018-01-06').date,
data=data,
dtype=np.float64)
df.fillna(-np.inf, inplace=True)
defsearch_rows(row):
return np.where(row.isin(sorted(row, reverse=True)[:3]), row, -np.inf)
df = df.apply(search_rows, axis=1)
df.plot.bar(stacked=True)
Post a Comment for "Create A Stacked Bar Chart Of The N Largest Columns Per Row In A Dataframe"