1 回答
TA贡献1820条经验 获得超9个赞
熊猫 > 0.25.0
#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
.set_index('date')
.groupby(['ID','type'])
.resample('D').ffill()
.drop(columns = 'variable')
.explode('codes')
.reset_index(level=[0,1],drop=True)
.sort_values(['ID','type','codes'])
.reset_index()
.reindex(columns = ['ID','date','codes','type'])
)
print(new_df)
熊猫 < 0.25.0
#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
.set_index('date')
.groupby(['ID','type'])
.resample('D').ffill()
.drop(columns = 'variable'))
new_df = (new_df.reindex(new_df.index.repeat(new_df.codes.str.len()))
.assign(codes=np.concatenate(new_df.codes.values))
.reset_index(level=[0,1],drop=True)
.sort_values(['ID','type','codes'])
.reset_index()
.reindex(columns = ['ID','date','codes','type']))
print(new_df)
输出
ID date codes type
0 1 2019-01-01 x A
1 1 2019-01-02 x A
2 1 2019-01-03 x A
3 1 2019-01-04 x A
4 1 2019-01-05 x A
5 1 2019-01-01 y A
6 1 2019-01-02 y A
7 1 2019-01-03 y A
8 1 2019-01-04 y A
9 1 2019-01-05 y A
10 2 2019-01-01 x B
11 2 2019-01-02 x B
12 2 2019-01-03 x B
13 2 2019-01-04 x B
14 2 2019-01-05 x B
15 2 2019-01-01 y B
16 2 2019-01-02 y B
17 2 2019-01-03 y B
18 2 2019-01-04 y B
19 2 2019-01-05 y B
20 2 2019-01-01 z B
21 2 2019-01-02 z B
22 2 2019-01-03 z B
23 2 2019-01-04 z B
24 2 2019-01-05 z B
添加回答
举报
