1 回答

TA贡献1825条经验 获得超6个赞
您的外观与您发布的数据框略有不同:structure
> df
Subject Recipient Length Folder Message Date Edit
1 80 out NA 1/2/2020 1:00:01 AM TRUE
2 80 out NA 1/2/2020 1:00:05 AM TRUE
3 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:10 AM TRUE
4 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:15 AM TRUE
5 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:30 AM TRUE
6 NA NA NA
7 NA NA NA
8 hey sarah@mail.com,gee@mail.com 80 draft NA 1/2/2020 1:02:00 AM TRUE
9 hey sarah@mail.com,gee@mail.com 80 draft NA 1/2/2020 1:02:05 AM TRUE
10 NA NA NA
11 NA NA NA
12 hey sarah@mail.com,gee@mail.com 100 draft NA 1/2/2020 1:03:00 AM TRUE
13 hey sarah@mail.com,gee@mail.com 100 draft NA 1/2/2020 1:03:20 AM TRUE
此外,您所需的输出表明您希望按其他类别拆分组,但这不是您的描述所说的,因此我没有按 分组。不过,如果您愿意,这很容易改变。FolderFolder
您可以使用运行长度编码来消除排序数据中相同连续值的组的歧义,但在 R 中,转换为数据框列有点棘手。我用这个答案来实现这一点。rle
library(lubridate)
library(dplyr)
df %>%
mutate(Date = mdy_hms(Date),
Key = paste(Subject, Recipient, Length, sep = "_")) %>%
arrange(Date) %>%
filter(Folder == "out" | Folder == "draft" & Edit == TRUE) %>%
mutate(RLE = {RLE = rle(Key) ; rep(seq_along(RLE$lengths), RLE$lengths)}) %>%
group_by(RLE) %>%
summarize(Start = first(Date),
End = last(Date),
Duration = as.numeric(End) - as.numeric(Start))
这将从第 1:2 行、3:5+8:9 和 12:13 行创建组。这些组给出以下持续时间:
# A tibble: 3 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:02:05 115
3 3 2020-01-02 01:03:00 2020-01-02 01:03:20 20
如果要包含在分组中,请将其添加到创建 中包含的内容中。这使得小组1:2,3:5,8:9和12:13。这样做会得到这样的结果:FolderKey
# A tibble: 4 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:00:30 20
3 3 2020-01-02 01:02:00 2020-01-02 01:02:05 5
4 4 2020-01-02 01:03:00 2020-01-02 01:03:20 20
添加回答
举报