首页猿问在 Pandas...

在 Pandas 中按多列填充缺失的年份 groupy 并按顺序水平显示多列

Python

喵喔喔 2022-06-28 10:44:57

对于如下数据框，我想在每组中填充缺失的年份（从 2015 年到 2017 年city）district; 然后pct通过按列分组计算：city,district和year, 在最后一步，然后水平显示value和pct列？ city district value year0 sh a 2 20151 sh a 3 20162 sh b 5 20153 sh b 3 20164 bj c 4 20155 bj c 3 2017到目前为止我所做的：1. 填补缺失的年份，但尚未工作：rng = pd.date_range('2015', '2017', freq='YS').dt.yeardf = df.apply(lambda x: x.reindex(rng, fill_value = 0))2.按和pct分组计算：citydistrictdf['pct'] = df.sort_values('year').groupby(['city', 'district']).value.pct_change()3. 水平显示value和pct列，但顺序不是我想要的：df.pivot_table(columns='year', index=['city','district'], values=['value', 'pct'], fill_value='NaN').reset_index()到目前为止我得到的输出： city district pct value year 2015 2016 2017 2015 2016 20170 bj c NaN NaN -0.25 4.0 NaN 31 sh a NaN 0.5 NaN 2.0 3 NaN2 sh b NaN -0.4 NaN 5.0 3 NaN我怎么能得到预期的结果会是这样？city district 2015 2016 2017 value pct value pct value pctbj c 4 3 sh a 2 3 0.5 sh b 5 3 -0.4

查看完整描述

1 回答

青春有我

TA贡献1784条经验获得超8个赞

DataFrame.swaplevel与一起使用DataFrame.sort_index，还添加了另一个解决方案reindex：

rng = pd.date_range('2015', '2017', freq='YS').year

c = df['city'].unique()

d = df['district'].unique()

mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])

df = df.set_index(['city','district','year']).reindex(mux)

df['pct'] = df.sort_values('year').groupby(['city', 'district']).value.pct_change()

df = df.pivot_table(columns='year',

index=['city','district'],

values=['value', 'pct'],

fill_value='NaN')

df = df.swaplevel(0,1, axis=1).sort_index(axis=1, level=0)

print (df)

year 2015 2016 2017

pct value pct value pct value

city district

bj c NaN 4.0 0.0 NaN -0.25 3

sh a NaN 2.0 0.5 3 0.00 NaN

b NaN 5.0 -0.4 3 0.00 NaN

编辑：错误：

ValueError：无法处理非唯一的多索引！

表示每个传递给 groupby 的列都有重复项，所以这里是 by ['city','district','year']。解决方案是创建唯一值 - 例如通过聚合平均值：

print (df)

# city district value year

#0 sh a 2 2015

#0 sh a 20 2015

#1 sh a 3 2016

#2 sh b 5 2015

#3 sh b 3 2016

#4 bj c 4 2015

#5 bj c 3 2017

rng = pd.date_range('2015', '2017', freq='YS').year

c = df['city'].unique()

d = df['district'].unique()

mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])

print (df.groupby(['city','district','year'])['value'].mean())

city district year

bj c 2015 4

2017 3

sh a 2015 11

2016 3

b 2015 5

2016 3

Name: value, dtype: int64

df = df.groupby(['city','district','year'])['value'].mean().reindex(mux)

print (df)

#city district year

#sh a 2015 11.0

# 2016 3.0

# 2017 NaN

# b 2015 5.0

# 2016 3.0

# 2017 NaN

# c 2015 NaN

# 2016 NaN

# 2017 NaN

#bj a 2015 NaN

# 2016 NaN

# 2017 NaN

# b 2015 NaN

# 2016 NaN

# 2017 NaN

# c 2015 4.0

# 2016 NaN

# 2017 3.0

#Name: value, dtype: float64

反对回复 2022-06-28

1 回答
0 关注
142 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在 Pandas 中按多列填充缺失的年份 groupy 并按顺序水平显示多列

在 Pandas 中按多列填充缺失的年份 groupy 并按顺序水平显示多列

1 回答

添加回答