Skip to content Skip to sidebar Skip to footer

Reshape Pandas Dataframe Columns By Block Of N Columns

I have 1 dataframe where blocks of columns need to be reshaped to rows. I tried to use stack() and melt() but could not manage to find the right way. Here is an example of what I e

Solution 1:

You can convert non columns names with _ to index by DataFrame.set_index, then splitting columns by Series.str.split and reshape by DataFrame.stack:

df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(level=0).reset_index()
print (df1)
    id  year level_2   A   B   C
0   a1    20       b   1591   a1    20       c  1317212   a1    20       d  2529333   a2    20       b   26104   a2    20       c  1418225   a2    20       d  2630346   a3    19       b   37117   a3    19       c  1519238   a3    19       d  2731359   a4    18       b   481210  a4    18       c  16202411  a4    18       d  283236

If need also set column origin is possible use DataFrame.rename_axis:

df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['origin',None], axis=1).stack(0).reset_index()
print (df1)
    id  year origin   A   B   C
0   a1    20      b   1591   a1    20      c  1317212   a1    20      d  2529333   a2    20      b   26104   a2    20      c  1418225   a2    20      d  2630346   a3    19      b   37117   a3    19      c  1519238   a3    19      d  2731359   a4    18      b   481210  a4    18      c  16202411  a4    18      d  283236

Or use wide_to_long with change order of values with _ like A_b to b_A:

df.columns = [f'{"_".join(x[::-1])}'for x in df.columns.str.split('_')]
df1 = pd.wide_to_long(df, 
                      stubnames=['A','B','C'],
                      i=['id','year'], 
                      j='origin', 
                      sep='_',
                      suffix=r'\w+').reset_index()
print (df1)
    id  year origin   A   B   C
0   a1    20      b   1591   a1    20      c  1317212   a1    20      d  2529333   a2    20      b   26104   a2    20      c  1418225   a2    20      d  2630346   a3    19      b   37117   a3    19      c  1519238   a3    19      d  2731359   a4    18      b   481210  a4    18      c  16202411  a4    18      d  283236

Solution 2:

You could also use pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:

 # installlatestdevversion
# pipinstallgit+https://github.com/ericmjl/pyjanitor.gitimportjanitordf.pivot_longer(index=["id", "year"], 
                names_to=("origin", ".value"), 
                names_sep="_")

    idyearoriginABC0a120b1591a220b26102a319b37113a418b48124a120c1317215a220c1418226a319c1519237a418c1620248a120d2529339a220d26303410a319d27313511a418d283236

The names_sep value splits the columns; the split values that pair with .value remain as column headers, while the other values are lumped underneath the origin column.

if you want the data in order of appearance, you can use the sort_by_appearance parameter:

df.pivot_longer(
    index=["id", "year"],
    names_to=("origin", ".value"),
    names_sep="_",
    sort_by_appearance=True,
)


    idyearoriginABC0a120b1591a120c1317212a120d2529333a220b26104a220c1418225a220d2630346a319b37117a319c1519238a319d2731359a418b481210a418c16202411a418d283236

Post a Comment for "Reshape Pandas Dataframe Columns By Block Of N Columns"