已解决430363个问题，去搜搜看，总会有你想问的

重复计算的最佳实践

关注

首页猿问重复计算的最佳实践

重复计算的最佳实践

Python

largeQ 2023-06-20 16:27:19

机器中的齿轮：数据包含当前 12 个月的数据并水平堆叠。每个月都会修改更新并附加新的月份。 ID |Date |Month1_a |Month1_b |Month1_c |Month2_a |Month2_b |Month2_c |Month3_a |Month3_b |Month3_c ## |MM/DD/YYYY |abc |zxy |123 |NULL |zxy |122 |abc |zxy |123实际数据文件没有标题，并作为每月第 1 个月的不同文件等在下游摄取。ID | Date |Month1_a |Month1_b |Month1_c |New Column## |MM/DD/YYYY |abc |zxy |123 | #ID | Date |Month2_a |Month2_b |Month2_c |New Column## |MM/DD/YYYY |NULL |zxy |122 | #除了复制文件 12 次。是否有任何关于阅读一次并循环创建我的输出的建议。我已经制定了第 1 个月的逻辑，我对如何进入第 2 个月以上感到困惑。最初是想读取文件 > 删除 Month 3+ > Drop Month 1 > Run Logic，但我不确定是否有更好/最佳实践。谢谢。

查看完整描述

1 回答

宝慕林4294392

TA贡献2021条经验获得超8个赞

这将输出 n 个 csv 文件，其中 n 是输入数据中的月数。希望这就是您所追求的。

import pandas as pd

df = pd.read_csv('my_data.csv', sep='|')

# Strip whitespace from column names

df.columns = [x.strip() for x in df.columns]

# Get a set of months in the data by splitting on _ and removing 'Month' from

# the first part

months = set([x.split('_')[0].replace('Month','') for x in df.columns if 'Month' in x])

# For each numeric month in months, add those columns with that number in it to

# the ID and Date columns and write to a csv with that month number in the csv title

for month in months:

base_columns = ['ID','Date']

base_columns.extend([x for x in df.columns if 'Month'+month in x])

df[base_columns].to_csv(f'Month_{month}.csv', index=False)

反对回复 2023-06-20

1 回答
0 关注
97 浏览

关注

添加回答

举报

0/150

提交

取消

意见反馈帮助中心 APP下载