首页猿问根据 pandas...

根据 pandas 中的条件由公司创建一个虚拟对象

Python

犯罪嫌疑人X 2023-09-12 15:35:57

我有一个 pandas 数据框，如下所示：data = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5]} df = pd.DataFrame(data) df我想为每个公司创建一个虚拟变量，条件如下：只要变量“var”连续两年等于或小于 0.5，“dummy”就等于 1，因此变量“dummy”如下所示：data = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5], "dummy" : [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1]} df = pd.DataFrame(data) df最好的方法是什么？

查看完整描述

3 回答

墨色风雨

TA贡献1853条经验获得超6个赞

您可以只移动，检查阈值并与原始系列的检查结合起来：

df.groupby('firm')['var'].shift().le(.5) & df['var'].le(.5)

这应该比稍快一些groupby().apply。

另一种方法（在您需要检查几年的情况下更好）是rolling：

df['dummy'] = df.groupby('firm')['var'].transform(lambda x: x.rolling(2).max().le(.5))

输出：

0 False

1 False

2 False

3 False

4 False

5 False

6 False

7 False

8 True

9 False

10 True

11 True

12 False

13 False

14 False

15 False

16 True

17 True

Name: var, dtype: bool

反对回复 2023-09-12

慕田峪7331174

TA贡献1828条经验获得超13个赞

您的需求几乎可以直接转换为 pandas。

首先groupby坚定，然后检查您的条件是否满足apply。

你可以得到下一年shift

import pandas as pd

data = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5]}

df = pd.DataFrame(data)

# Solution

df['dummy'] = df.groupby('firm')['var'].apply(lambda x: (x.shift() <= .5) & (x <= .5)).view('i1')

print(df)

出去：

firm year var dummy

0 1 2000 3.0 0

1 1 2001 2.0 0

2 1 2002 1.0 0

3 1 2003 0.5 0

4 2 1990 5.0 0

5 2 1991 3.0 0

6 2 1992 2.0 0

7 2 1993 0.5 0

8 2 1994 0.5 1

9 3 2010 0.5 0

10 3 2011 0.0 1

11 3 2012 0.0 1

12 4 2005 8.0 0

13 4 2006 5.0 0

14 4 2007 3.0 0

15 4 2008 0.5 0

16 4 2009 0.5 1

17 4 2010 0.5 1

反对回复 2023-09-12

炎炎设计

TA贡献1808条经验获得超4个赞

让我们尝试groupby一下shift

df.groupby('firm')['var'].apply(lambda x : x.shift().le(0.5) & x.le(0.5))

0 False

1 False

2 False

3 False

4 False

5 False

6 False

7 False

8 True

9 False

10 True

11 True

12 False

13 False

14 False

15 False

16 True

17 True

Name: var, dtype: bool

反对回复 2023-09-12

3 回答
0 关注
74 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

根据 pandas 中的条件由公司创建一个虚拟对象

根据 pandas 中的条件由公司创建一个虚拟对象

3 回答

添加回答