Reshape Pandas Dataframe Columns By Block Of N Columns
I have 1 dataframe where blocks of columns need to be reshaped to rows. I tried to use stack() and melt() but could not manage to find the right way. Here is an example of what I e
Solution 1:
You can convert non columns names with _
to index by DataFrame.set_index
, then splitting columns by Series.str.split
and reshape by DataFrame.stack
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(level=0).reset_index()
print (df1)
id year level_2 A B C
0 a1 20 b 1591 a1 20 c 1317212 a1 20 d 2529333 a2 20 b 26104 a2 20 c 1418225 a2 20 d 2630346 a3 19 b 37117 a3 19 c 1519238 a3 19 d 2731359 a4 18 b 481210 a4 18 c 16202411 a4 18 d 283236
If need also set column origin
is possible use DataFrame.rename_axis
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['origin',None], axis=1).stack(0).reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1591 a1 20 c 1317212 a1 20 d 2529333 a2 20 b 26104 a2 20 c 1418225 a2 20 d 2630346 a3 19 b 37117 a3 19 c 1519238 a3 19 d 2731359 a4 18 b 481210 a4 18 c 16202411 a4 18 d 283236
Or use wide_to_long
with change order of values with _
like A_b
to b_A
:
df.columns = [f'{"_".join(x[::-1])}'for x in df.columns.str.split('_')]
df1 = pd.wide_to_long(df,
stubnames=['A','B','C'],
i=['id','year'],
j='origin',
sep='_',
suffix=r'\w+').reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1591 a1 20 c 1317212 a1 20 d 2529333 a2 20 b 26104 a2 20 c 1418225 a2 20 d 2630346 a3 19 b 37117 a3 19 c 1519238 a3 19 d 2731359 a4 18 b 481210 a4 18 c 16202411 a4 18 d 283236
Solution 2:
You could also use pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:
# installlatestdevversion
# pipinstallgit+https://github.com/ericmjl/pyjanitor.gitimportjanitordf.pivot_longer(index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_")
idyearoriginABC0a120b1591a220b26102a319b37113a418b48124a120c1317215a220c1418226a319c1519237a418c1620248a120d2529339a220d26303410a319d27313511a418d283236
The names_sep
value splits the columns; the split values that pair with .value
remain as column headers, while the other values are lumped underneath the origin
column.
if you want the data in order of appearance, you can use the sort_by_appearance
parameter:
df.pivot_longer(
index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_",
sort_by_appearance=True,
)
idyearoriginABC0a120b1591a120c1317212a120d2529333a220b26104a220c1418225a220d2630346a319b37117a319c1519238a319d2731359a418b481210a418c16202411a418d283236
Post a Comment for "Reshape Pandas Dataframe Columns By Block Of N Columns"