为了账号安全,请及时绑定邮箱和手机立即绑定

午夜到达时更改数据框中的日期

午夜到达时更改数据框中的日期

一只甜甜圈 2023-12-20 19:55:38
我有一个数据框,其中包含日期时间格式的“Date_and_time”列。不幸的是,当到达午夜时(第 15235 行:2020-08-02 00:00:00.000000),日期不会相应更改。因此,到达午夜 (00:00:00.000000) 时,2020-08-02 00:00:00.000000 应该转到 2020-08-03 00:00:00.00000。在第 15235 行已到达午夜,但日期直到第 16000 行才更改。如何更改此设置,以便在到达午夜时日期是正确的?谢谢。#Load filedf = pd.read_csv(file, sep=";", names=["Date", "Time", "ID1","ID2","ID3","MP","ET"], skiprows = 1 ,float_precision='round_trip')df1 = df['Time'].str.split(expand=True)#Use columns 'Date' and 'Time' to create column 'Date_and_time' in datetime formatdf['Date_and_time'] =  (pd.to_datetime(df['Date']) +               pd.to_timedelta(df1[0]) +                pd.to_timedelta(df1[1].str.replace('ms','').astype(int), unit='us'))Out[45]:               Date  ...              Date_and_time0       2020/08/02  ... 2020-08-02 21:21:46.0000001       2020/08/02  ... 2020-08-02 21:21:46.0821912       2020/08/02  ... 2020-08-02 21:21:46.1643833       2020/08/02  ... 2020-08-02 21:21:46.2465754       2020/08/02  ... 2020-08-02 21:21:46.328767           ...  ...                        ...15235   2020/08/02  ... 2020-08-02 00:00:00.00000015236   2020/08/02  ... 2020-08-02 00:00:00.08219115237   2020/08/02  ... 2020-08-02 00:00:00.16438315238   2020/08/02  ... 2020-08-02 00:00:00.246575           ...  ...                        ...16000   2020/08/03  ... 2020-08-03 00:00:16.082191           ...  ...                        ...330404  2020/08/03  ... 2020-08-03 03:00:33.000000330405  2020/08/03  ... 2020-08-03 03:00:33.040513330406  2020/08/03  ... 2020-08-03 03:00:33.081026330407  2020/08/03  ... 2020-08-03 03:00:33.121539330408  2020/08/03  ... 2020-08-03 03:00:33.162052[330409 rows x 8 columns]
查看完整描述

1 回答

?
倚天杖

TA贡献1828条经验 获得超3个赞

这本质上是另一个需要“diff-cumsum”技巧来累积负面变化数量的问题。然而,在这种情况下,.diff()不支持日期时间差异,因此做起来会更加棘手。


这是一个快速而肮脏的展示df["Date_and_time"]。您应该对自己涉及的其他列执行类似操作。


from datetime import timedelta


#df = pd.read_clipboard(sep=r"\s{2,}")

df["Date_and_time"] = pd.to_datetime(df["Date_and_time"])

# get timestamp in nanoseconds

df["ns"] = df["Date_and_time"].values.astype(np.int64)

# detect reversed time change and accumulate days

df["days"] = (df["ns"].diff() < 0).cumsum()

# add the days found 

df["Date_and_time_new"] = df.apply(lambda row: row["Date_and_time"] + timedelta(days=row["days"]), axis=1)


df

Out[76]: 

          Date              Date_and_time  ...  days          Date_and_time_new

0   2020/08/02 2020-08-02 21:21:46.000000  ...     0 2020-08-02 21:21:46.000000

1   2020/08/02 2020-08-02 21:21:46.082191  ...     0 2020-08-02 21:21:46.082191

2   2020/08/02 2020-08-02 21:21:46.164383  ...     0 2020-08-02 21:21:46.164383

3   2020/08/02 2020-08-02 21:21:46.246575  ...     0 2020-08-02 21:21:46.246575

4   2020/08/02 2020-08-02 21:21:46.328767  ...     0 2020-08-02 21:21:46.328767

5   2020/08/02 2020-08-02 00:00:00.000000  ...     1 2020-08-03 00:00:00.000000

6   2020/08/02 2020-08-02 00:00:00.082191  ...     1 2020-08-03 00:00:00.082191

7   2020/08/02 2020-08-02 00:00:00.164383  ...     1 2020-08-03 00:00:00.164383

8   2020/08/02 2020-08-02 00:00:00.246575  ...     1 2020-08-03 00:00:00.246575

9   2020/08/03 2020-08-03 00:00:16.082191  ...     1 2020-08-04 00:00:16.082191

10  2020/08/03 2020-08-03 03:00:33.000000  ...     1 2020-08-04 03:00:33.000000

11  2020/08/03 2020-08-03 03:00:33.040513  ...     1 2020-08-04 03:00:33.040513

12  2020/08/03 2020-08-03 03:00:33.081026  ...     1 2020-08-04 03:00:33.081026

13  2020/08/03 2020-08-03 03:00:33.121539  ...     1 2020-08-04 03:00:33.121539

14  2020/08/03 2020-08-03 03:00:33.162052  ...     1 2020-08-04 03:00:33.162052

[15 rows x 5 columns]


查看完整回答
反对 回复 2023-12-20
  • 1 回答
  • 0 关注
  • 55 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信