2 回答

TA贡献1853条经验 获得超6个赞
不是悸子
result = []
for i in df['ID'].unique():
adf = df[df['ID'] == i].sort_values(by="Date").reset_index(drop=True)
i = adf.where(adf['Start_flag'] == 1).last_valid_index()
result.append(adf.iloc[range(i, len(adf))])
print (pd.concat(result).reset_index(drop=True))
输出:
Date ID Start_flag end
0 2019-01-09 100 1 0
1 2019-01-10 100 0 0
2 2019-01-11 100 0 0
3 2019-01-12 100 0 0
4 2019-01-03 500 1 0
5 2019-01-04 500 0 0
6 2019-01-05 500 0 0
7 2019-01-06 500 0 0
8 2019-01-07 500 0 0
9 2019-01-08 500 0 0
10 2019-01-09 700 1 0
11 2019-01-10 700 0 0
12 2019-01-11 700 0 1
注意:我们可以通过将逻辑移动到函数并通过 调用函数来避免循环。但是,在拳头组上运行函数两次,因此我们必须确保我们的函数没有副作用。applygroupbygroupby
使用分组:
def fun(adf):
adf = adf.sort_values(by="Date").reset_index(drop=True)
i = adf.where(adf['Start_flag'] == 1).last_valid_index()
return adf.iloc[range(i, len(adf))]
print (df.groupby('ID').apply(fun).reset_index(drop=True))

TA贡献1845条经验 获得超8个赞
最终更正的解决方案是:
def validateData(adf):
adf = adf.sort_values(by="Date").reset_index(drop=True)
indx = adf.where(((adf['Start_flag']==0) & (adf['Date']==adf['Date'].min())) | (adf['Start_flag'] == 1)).last_valid_index()
return adf.iloc[range(indx, len(adf))]
def filterData(df):
start_time = datetime.now()
print('Start_time=', start_time)
RESULT_DF = df.groupby('ID').apply(lambda x: validateData(x))
print("--- %s seconds ---" % (datetime.now() - start_time))
return RESULT_DF
要应用于数据:RESULT_DF = filterData(df)
添加回答
举报