首页猿问如何在没有 for...

如何在没有 for 循环的情况下处理 DataFrame？

Python

弑天下 2022-01-05 12:24:12

我的数据帧是： Date Open High Low Close Adj Close Volume5932 2016-08-18 218.339996 218.899994 218.210007 218.860001 207.483215 529893005933 2016-08-19 218.309998 218.750000 217.740005 218.539993 207.179825 754430005934 2016-08-22 218.259995 218.800003 217.830002 218.529999 207.170364 613688005935 2016-08-23 219.250000 219.600006 218.899994 218.970001 207.587479 533992005936 2016-08-24 218.800003 218.910004 217.360001 217.850006 206.525711 717289005937 2016-08-25 217.399994 218.190002 217.220001 217.699997 206.383514 692248005938 2016-08-26 217.919998 219.119995 216.250000 217.289993 205.994827 1225063005939 2016-08-29 217.440002 218.669998 217.399994 218.360001 207.009201 686061005940 2016-08-30 218.259995 218.589996 217.350006 218.000000 206.667908 581145005941 2016-08-31 217.610001 217.750000 216.470001 217.380005 206.080124 852695005942 2016-09-01 217.369995 217.729996 216.029999 217.389999 206.089645 978442005943 2016-09-02 218.389999 218.869995 217.699997 218.369995 207.018692 792939005944 2016-09-06 218.699997 219.119995 217.860001 219.029999 207.644394 56702100目前，我正在这样做： state['open_price'] = lookback.Open.iloc[-1:].get_values()[0] for ind, row in lookback.reset_index().iterrows(): if ind < self.LOOKBACK_DAYS: state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume但这是非常缓慢的。有没有更多矢量化的方法来做到这一点？

查看完整描述

1 回答

狐的传说

TA贡献1804条经验获得超3个赞

一种方法是欺骗并使用底层数组 .values

我还将添加一些我用来创建等效示例的步骤：

import pandas as pd

from itertools import product

initial = ['cash', 'num_shares', 'somethingsomething']

initial_series = pd.Series([1, 2, 3], index = initial)

print(initial_series)

#Output:

cash 1

num_shares 2

somethingsomething 3

dtype: int64

好的，只是输出系列开始时的一些值，为示例模拟。

df = pd.read_clipboard(sep='\s\s+') #pure magic

print(df.head())

#Output:

Date Open ... Adj Close Volume

5932 2016-08-18 218.339996 ... 207.483215 52989300

5933 2016-08-19 218.309998 ... 207.179825 75443000

5934 2016-08-22 218.259995 ... 207.170364 61368800

5935 2016-08-23 219.250000 ... 207.587479 53399200

5936 2016-08-24 218.800003 ... 206.525711 71728900

[5 rows x 7 columns]

df 现在本质上是您在示例中提供的数据框。剪贴板技巧来自此处，非常适合 Pandas MCVE。

to_select = ['Close', 'Open', 'Volume']

SOMELOOKBACK = 6000 #mocked

final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]

这准备了索引，看起来像这样

['Close_68',

'Open_68',

'Volume_68',

'Close_67',

'Open_67',

'Volume_67',

...

]

现在，只需从数据框中选择相关列，用于.values获取二维数组然后展平，以获得最终系列。

final_series = pd.Series(df[to_select].values.flatten(), index = final_index)

result = initial_series.append(final_series)

#Output:

cash 1.000000e+00

num_shares 2.000000e+00

somethingsomething 3.000000e+00

Close_68 2.188600e+02

Open_68 2.183400e+02

Volume_68 5.298930e+07

Close_67 2.185400e+02

Open_67 2.183100e+02

Volume_67 7.544300e+07

Close_66 2.185300e+02

Open_66 2.182600e+02

Volume_66 6.136880e+07

...

Close_48 2.133700e+02

Open_48 2.134800e+02

Volume_48 1.552364e+08

Length: 66, dtype: float64

反对回复 2022-01-05

1 回答
0 关注
131 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何在没有 for 循环的情况下处理 DataFrame？

如何在没有 for 循环的情况下处理 DataFrame？

1 回答

添加回答