根据其他行值填充 Pandas DataFrame NaN 值的最佳方法是什么？

我有一个 DataFrame 在所有列中都有一些 NaN 值（总共 3 列）。我想以最快的方法用其他行中的最新有效值填充每个单元格中的 NaN 值。例如，如果 A 列是 NaN，B 列是“123”，当 B 列是“123”时，我想在 A 列中找到最新值，并用该最新值填充 NaN 值。我知道使用循环很容易做到这一点，但我正在考虑具有 2500 万条记录的 DataFrame 的性能。任何想法都可以提供帮助。

查看完整描述

1 回答

动漫人物

TA贡献1815条经验获得超10个赞

此解决方案使用 for 循环，但它会遍历 A 的值为 NaN 的值。

A = The column containing NaNs

B = The column to be referenced

import pandas as pd

import numpy as np

#Consider this dataframe

df = pd.DataFrame({'A':[1,2,3,4,np.nan,6,7,8,np.nan,10],'B':['xxxx','b','xxxx','d','xxxx','f','yyyy','h','yyyy','j']})

A B

0 1.0 xxxx

1 2.0 b

2 3.0 xxxx

3 4.0 d

4 NaN xxxx

5 6.0 f

6 7.0 yyyy

7 8.0 h

8 NaN yyyy

9 10.0 j

for i in list(df.loc[np.isnan(df.A)].index): #looping over indexes where A in NaN

#dict with the keys as B and values as A

#here the dict keys will be unique and latest entries of B, hence having latest corresponding A values

dictionary = df.iloc[:i+1].dropna().set_index('B').to_dict()['A']

df.iloc[i,0] = dictionary[df.iloc[i,1]] #using the dict to change the value of A

这是执行代码后 df 的样子

A B

0 1.0 xxxx

1 2.0 b

2 3.0 xxxx

3 4.0 d

4 3.0 xxxx

5 6.0 f

6 7.0 yyyy

7 8.0 h

8 7.0 yyyy

9 10.0 j

注意在 index = 4 处，A 的值如何更改为 3.0 而不是 1.0

反对回复 2021-12-21

热搜

最近搜索清空

根据其他行值填充 Pandas DataFrame NaN 值的最佳方法是什么？

根据其他行值填充 Pandas DataFrame NaN 值的最佳方法是什么？

1 回答

添加回答