Skip to content Skip to sidebar Skip to footer

Create A Stacked Bar Chart Of The N Largest Columns Per Row In A Dataframe

I have a dataframe of N columns of values by M dates. I'm looking to plot a stacked bar chart of the 3 largest values per date. Test dataframe: import pandas import numpy data = {

Solution 1:

Starting with your input df:

top3_by_date = (
    # bring the date back as a columnto use as a grouping var
    df.reset_index()
    # make a long DF ofdate/column/name value
    .melt(id_vars='index')
    # order DF by highest valuesfirst
    .sort_values('value', ascending=False)
    # groupby the index and take the first3rowsofeach
    .groupby('index')
    .head(3)
    # pivot back so we've got an X & Y to chart...
    .pivot('index', 'variable')
    # drop the value level as we don't need that
    .droplevel(level=0, axis=1)
)

This gives:

variable       A     B     C     D
index                             
2018-01-0165.054.034.0NaN2018-01-0254.047.039.0NaN2018-01-03NaN60.057.047.02018-01-04NaN34.056.047.02018-01-05NaN40.048.035.02018-01-06NaN35.0NaN70.0

Then you can do top3_by_date.plot.bar(stacked=True), which should give you something similar to:

enter image description here

Solution 2:

It is possible, but somewhat convoluted, since you need to use bottom to offset each bar above those at the same date with lower values. This prevents bars with higher values hiding bars with lower values.

For each column (representing one series in the bar chart), 3 arrays are required:

  • dates: the dates which have values for this column (ie: the dates for which this column is one of the 3 largest values)
  • values: the difference between this value and the next lower value
  • bottoms: the value of the next lower value

Building up the arrays:

col_dates   = collections.defaultdict(list)
col_values  = collections.defaultdict(list)
col_bottoms = collections.defaultdict(list)

for date,row in df.iterrows():
    top = row.nlargest(3)
    for i,kv in enumerate(top.iteritems()):
        col, val = kv
        next_val = top.values[i+1] if i+1 < len(top.values) else0

        col_dates  [col].append(date)
        col_values [col].append(val - next_val)
        col_bottoms[col].append(next_val)

Plotting the bar chart:

fig = pyplot.figure(figsize=(20,10))
ax = fig.add_subplot(1,1,1)

forcol,vals in col_values.items():
    dates   = col_dates[col]
    bottoms = col_bottoms[col]

    ax.bar(matplotlib.dates.date2num(dates), vals, width=.6, bottom=bottoms, label=col)
    ax.xaxis_date()

ax.legend(loc='best', fontsize='large')

pyplot.show()

The resulting plot:

enter image description here

Solution 3:

You can do this with a simple apply. It will not be vectorized but I think it's much clearer to read. In this case I filled NaN with -np.inf because sort doesn't work well with NaN values.

import pandas as pd
import numpy as np

data = {
    'A': [ 65, 54, 12, 14, 30, np.nan ],
    'B': [ 54, 47, 60, 34, 40, 35 ],
    'C': [ 34, 39, 57, 56, 48, np.nan ],
    'D': [ 20, 18, 47, 47, 35, 70 ]
}

df = pd.DataFrame(index=pd.date_range('2018-01-01', '2018-01-06').date,
                  data=data,
                  dtype=np.float64)

df.fillna(-np.inf, inplace=True)

defsearch_rows(row):
    return np.where(row.isin(sorted(row, reverse=True)[:3]), row, -np.inf)

df = df.apply(search_rows, axis=1)
df.plot.bar(stacked=True)

Post a Comment for "Create A Stacked Bar Chart Of The N Largest Columns Per Row In A Dataframe"