首页猿问如何比较数据框中同一列的数据

如何比较数据框中同一列的数据

Python

喵喵时光机 2023-09-26 16:52:40

我有一个 Panda 的数据框，如下所示：我想获得 2007 年 PIB 低于 2002 年的国家，但我无法仅使用 Pandas 内置方法而不使用 python 迭代或类似的东西来编写代码来做到这一点。我得到的最多的是以下几行：df[df[df.year == 2007].PIB < df[df.year == 2002].PIB].country但我收到以下错误：ValueError: Can only compare identically-labeled Series objects到目前为止，我只使用 Pandas 来过滤不同列中的数据，但我不知道如何比较同一列中的数据，在本例中是年份。

查看完整描述

4 回答

隔江千里

TA贡献1906条经验获得超10个赞

我的策略是使用数据透视表。假设没有两行具有相同的（“国家/地区”，“年份”）对。在此假设下，aggfunc=np.sum代表唯一的单一PIB值。

table = pd.pivot_table(df, values='PIB', index=['country'],
                    columns=['year'], aggfunc=np.sum)[[2002,2007]]
                    list(table[table[2002] > table[2007]].index)

数据透视表看起来像这样：

反对回复 2023-09-26

白猪掌柜的

TA贡献1893条经验获得超10个赞

我建议Series按country列创建索引，但必须在具有相同索引值的系列中2007和2002比较系列中使用相同数量的国家：

df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'],

'PIB': [200, 200, 100, 300],

'year': [2002, 2002, 2007, 2007]})

print (df)

country PIB year

0 Afganistan 200 2002

1 Zimbabwe 200 2002

2 Afganistan 100 2007

3 Zimbabwe 300 2007

df = df.set_index('country')

print (df)

PIB year

country

Afganistan 200 2002

Zimbabwe 200 2002

Afganistan 100 2007

Zimbabwe 300 2007

df1 = df.pivot('country','year','PIB')

print (df1)

year 2002 2007

country

Afganistan 200 100

Zimbabwe 200 300

countries = df1.index[df1[2007] < df1[2002]]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

s1 = df.loc[df.year == 2007, 'PIB']

s2 = df.loc[df.year == 2002, 'PIB']

print (s1)

country

Afganistan 100

Zimbabwe 300

Name: PIB, dtype: int64

print (s2)

country

Afganistan 200

Zimbabwe 200

Name: PIB, dtype: int64

countries = s1.index[s1 < s2]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

另一个想法是首先按年份进行旋转DataFrame.pivot，然后按年份选择列并与中的索引进行比较boolean indexing：

df1 = df.pivot('country','year','PIB')

print (df1)

year 2002 2007

country

Afganistan 200 100

Zimbabwe 200 300

countries = df1.index[df1[2007] < df1[2002]]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

反对回复 2023-09-26

尚方宝剑之说

TA贡献1788条经验获得超4个赞

这是我的数据框：

df = pd.DataFrame([

{"country": "a", "PIB": 2, "year": 2002},

{"country": "b", "PIB": 2, "year": 2002},

{"country": "a", "PIB": 1, "year": 2007},

{"country": "b", "PIB": 3, "year": 2007},

])

如果我过滤 2002 年和 2007 年这两年，我得到了。

df_2002 = df[df["year"] == 2007]

out :

country PIB year

0 a 2 2002

1 b 2 2002

df_2007 = df[df["year"] == 2007]

out :

country PIB year

2 a 1 2007

3 b 3 2007

您想要比较每个国家/地区 PIB 的演变。

Pandas 不知道这一点，它尝试比较值，但这里基于相同的索引。Witch不是你想要的，也是不可能的，因为索引不一样。

所以你只需要使用set_index()

df.set_index("country", inplace=True)

df_2002 = df[df["year"] == 2007]

out :

PIB year

country

a 1 2007

b 3 2007

df_2007 = df[df["year"] == 2007]

out :

PIB year

country

a 2 2002

b 2 2002

现在你可以进行比较

df_2002.PIB > df_2007.PIB

out:

country

a True

b False

Name: PIB, dtype: bool

# to get the list of countries

(df_2002.PIB > df_2007.PIB)[res == True].index.values.tolist()

out :

['a']

反对回复 2023-09-26

繁华开满天机

TA贡献1816条经验获得超4个赞

试试这个（考虑到您只需要这些国家/地区的列表）：

[i for i in df.country if df[(df.country==i) & (df.year==2007)].PIB.iloc[0] < df[(df.country==i) & (df.year==2002)].PIB.iloc[0]]

反对回复 2023-09-26

4 回答
0 关注
99 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何比较数据框中同一列的数据

如何比较数据框中同一列的数据

4 回答

添加回答