Skip to content Skip to sidebar Skip to footer

Merging Multiple Pandas Datasets With Non-unique Index

I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way. ex_dict[df1] date df1price1 df1price2 10-20-2015

Solution 1:

You can use a concat followed by a groupby('date') to flatten the result.

In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-2015        100        150        110        140        100        150
10-21-2015         90        100         90        110        NaN        NaN
10-22-2015        100        140        NaN        NaN         90        100
10-23-2015        NaN        NaN        110        120         80        130

Edit: As BrenBarn points out in the comments, you can use concat(axis=1) if you set the join column as the index of your dataframes:

df1.index = df1.date
df2.index = df2.date
df3.index = df3.date

In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
                  date  df1price1  df1price2        date  df2price1  \
10-20-2015  10-20-2015        100        150  10-20-2015        110
10-21-2015  10-21-2015         90        100  10-21-2015         90
10-22-2015  10-22-2015        100        140         NaN        NaN
10-23-2015         NaN        NaN        NaN  10-23-2015        110

            df2price2        date  df3price1  df3price2
10-20-2015        140  10-20-2015        100        150
10-21-2015        110         NaN        NaN        NaN
10-22-2015        NaN  10-22-2015         90        100
10-23-2015        120  10-23-2015         80        130

Solution 2:

You could use multiple merge on date column:

df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')

In [107]: df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')
Out[107]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-2015        100        150        110        140        100        150
10-21-2015         90        100         90        110        NaN        NaN
10-22-2015        100        140        NaN        NaN         90        100
10-23-2015        NaN        NaN        110        120         80        130

Some explanation: First you merging df1 and df2 on column date with joining outer. The the resulted dataframe you merging with df3 with the same attributes. And finnaly setting index date for your resulted dateframe. If your dataframes have date columns as index you could first do reset_index for each of them and merge on the column name containing date


Post a Comment for "Merging Multiple Pandas Datasets With Non-unique Index"