R中的group_by％>％扩展等同于更快

作者: نسر الصحراء
发布时间: 2025-01-02 12:20:14 (2月前)
转自：

4 条回复

0#
回复此人
日耀九洲 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 一个整合的解决方案可能是： </p> <pre> <code> df <- data.table::fread(" ID Start_year 01 1999 02 2004 03 2015 04 2007") library(padr) library(tidyverse) df %>% pad_int('Start_year', end_val = 2015, group = "ID") </code> </pre> </DIV>

编辑
1#
回复此人
撩心 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 你可以做到 </p> <pre> <code> out <- DT[, .(col = seq.int(Start_year, 2015L)), by = ID] out # ID col # 1: 1 1999 # 2: 1 2000 # 3: 1 2001 # 4: 1 2002 # 5: 1 2003 # 6: 1 2004 # 7: 1 2005 # 8: 1 2006 # 9: 1 2007 # ... </code> </pre> <P> 在你的情况下，你可能需要这样做 </p> <pre> <code> setDT(df)[, .(col = seq.int(Start_year, 2015L)), by = ID] </code> </pre> <HR /> <P> 一个 <code> tidyverse </code> 同样想法的方式 </p> <pre> <code> library(readr); library(dplyr); library(tidyr) tbl <- read_table(text) tbl %>% group_by(ID) %>% mutate(Start_year = list(seq.int(Start_year, 2015L))) %>% # rename(new_col = Start_year) unnest() </code> </pre> <P> 的<strong> 数据 </强> </p> <pre> <code> text <- "ID Start_year 01 1999 02 2004 03 2015 04 2007" library(data.table) DT <- fread(text) </code> </pre> </DIV>

编辑
2#
回复此人
圈圈红 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 如果你有足够的内存，你可以使用x年的全套ID并使用滚动连接进行过滤： </p> <pre> <code> res <- DT[ CJ(ID, Start_year = seq.int(min(Start_year), 2015L)), on=.(ID, Start_year), roll=TRUE, nomatch=0 ] setnames(res, "Start_year", "Year")[] </code> </pre> <P> <code> CJ </code> 采用ID和年的向量的“交叉连接”。如果您没有使用最新版本的data.table，则可能需要为两个参数命名（即， <code> CJ(ID = ID, Start_year = seq.int(min(Start_year), 2015L)) </code> ）。 </p> <P> <EM> 评论 </EM> 。 OP表示@markus的方法已经将操作降低到几秒钟，因此可能不需要进一步改进......而且，我不确定在任何情况下我的方法会更快。 </p> </DIV>

编辑

登录后才能参与评论