2 回答

TA贡献1895条经验 获得超3个赞
下面是没有缺失值和由分组可能比较柱Series customer与GroupBy.transform和GroupBy.all, GroupBy.any对于测试的所有值TrueS(所有非丢失)或至少一个值不缺失(任何非丢失),并将它传递到numpy.select:
g = df['canceled_at'].notna().groupby(df['customer'])
m1 = g.transform('all')
m2 = g.transform('any')
df['status'] = np.select([m1, m2],['canceled','downgrade'], np.nan)
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN nan
或者:
df['status'] = np.select([m1, m2],['canceled','downgrade'], '')
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN
如果只有NaNs 组需要转换为downgrade:
mask = df['canceled_at'].notna().groupby(df['customer']).transform('all')
df['status'] = np.where(mask,'canceled','downgrade')
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN downgrade

TA贡献1757条经验 获得超7个赞
这是一种方法:
import pandas as pd
def select_status(canceled):
c = canceled.count()
if c == 0:
status = ''
elif c == len(canceled):
status = 'canceled'
else:
status = 'downgrade'
return pd.Series(status, index=canceled.index)
df = pd.DataFrame({'customer': ['x', 'x', 'y', 'y', 'z', 'a'],
'canceled_at': ['3/27/2018', None, '2/2/2018', '2/2/2018', '1/1/2018', None]})
df['status'] = df.groupby('customer')['canceled_at'].apply(select_status)
print(df)
输出:
customer canceled_at status
0 x 3/27/2018 downgrade
1 x None downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a None
添加回答
举报