首页猿问将每个数据帧行切片为 3...

将每个数据帧行切片为 3 个具有不同切片范围的窗口

Python

慕码人2483693 2023-03-16 16:24:53

我想将我的数据帧的每一行切片成 3 个窗口，切片索引存储在另一个数据帧中，并针对数据帧的每一行进行更改。之后我想以 MultiIndex 的形式返回一个包含窗口的数据帧。每个窗口中比窗口中最长的行短的行应该用 NaN 值填充。由于我的实际数据框有大约 100.000 行和 600 列，我很关心一个有效的解决方案。考虑以下示例：这是我的数据框，我想将其分成 3 个窗口>>> df 0 1 2 3 4 5 6 70 0 1 2 3 4 5 6 71 8 9 10 11 12 13 14 152 16 17 18 19 20 21 22 23第二个数据框包含我的切片索引，其行数与df：>>> df_slice 0 10 3 51 2 62 4 7我试过切片窗户，像这样：first_window = df.iloc[:, :df_slice.iloc[:, 0]]first_window.columns = pd.MultiIndex.from_tuples([("A", c) for c in first_window.columns])second_window = df.iloc[:, df_slice.iloc[:, 0] : df_slice.iloc[:, 1]]second_window.columns = pd.MultiIndex.from_tuples([("B", c) for c in second_window.columns])third_window = df.iloc[:, df_slice.iloc[:, 1]:]third_window.columns = pd.MultiIndex.from_tuples([("C", c) for c in third_window.columns])result = pd.concat([first_window, second_window, third_window], axis=1)这给了我以下错误：TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [0 31 22 4Name: 0, dtype: int64] of <class 'pandas.core.series.Series'>我的预期输出是这样的：>>> result A B C 0 1 2 3 4 5 6 7 8 9 100 0 1 2 NaN 3 4 NaN NaN 5 6 71 8 9 NaN NaN 10 11 12 13 14 15 NaN2 16 17 18 19 20 21 22 NaN 23 NaN NaN在不遍历数据帧的每一行的情况下，是否有一个有效的解决方案来解决我的问题？

查看完整描述

1 回答

隔江千里

TA贡献1906条经验获得超10个赞

这是一个解决方案，使用meltand thenpivot_table加上一些逻辑来：

确定三组“A”、“B”和“C”。
将列向左移动，以便 NaN 仅出现在每个窗口的右侧。
重命名列以获得预期的输出。

t = df.reset_index().melt(id_vars="index")

t = pd.merge(t, df_slice, left_on="index", right_index=True)

t.variable = pd.to_numeric(t.variable)

t.loc[t.variable < t.c_0,"group"] = "A"

t.loc[(t.variable >= t.c_0) & (t.variable < t.c_1), "group"] = "B"

t.loc[t.variable >= t.c_1, "group"] = "C"

# shift relevant values to the left

shift_val = t.groupby(["group", "index"]).variable.transform("min") - t.groupby(["group"]).variable.transform("min")

t.variable = t.variable - shift_val

# extract a, b, and c groups, and create a multi-level index for their

# columns

df_a = pd.pivot_table(t[t.group == "A"], index= "index", columns="variable", values="value")

df_a.columns = pd.MultiIndex.from_product([["a"], df_a.columns])

df_b = pd.pivot_table(t[t.group == "B"], index= "index", columns="variable", values="value")

df_b.columns = pd.MultiIndex.from_product([["b"], df_b.columns])

df_c = pd.pivot_table(t[t.group == "C"], index= "index", columns="variable", values="value")

df_c.columns = pd.MultiIndex.from_product([["c"], df_c.columns])

res = pd.concat([df_a, df_b, df_c], axis=1)

res.columns = pd.MultiIndex.from_tuples([(c[0], i) for i, c in enumerate(res.columns)])

print(res)

输出是：

a b c

0 1 2 3 4 5 6 7 8 9 10

index

0 0.0 1.0 2.0 NaN 3.0 4.0 NaN NaN 5.0 6.0 7.0

1 8.0 9.0 NaN NaN 10.0 11.0 12.0 13.0 14.0 15.0 NaN

2 16.0 17.0 18.0 19.0 20.0 21.0 22.0 NaN 23.0 NaN NaN

反对回复 2023-03-16

1 回答
0 关注
46 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

将每个数据帧行切片为 3 个具有不同切片范围的窗口

将每个数据帧行切片为 3 个具有不同切片范围的窗口

1 回答

添加回答