我有以下df:
df< - tibble(country = c(“US”,“US”,“US”,“US”,“US”,“US”,“US”,“US”,“US”,“Mex”, “墨西哥”), year = c(1999,2000,2001,2002,2003,2004,2005,2006,2007,2000,…
我们可以找出第一个非NA的行索引 score 出现,然后创建一个序列 1 - index 至 n() - index 对于每个群体。
score
1 - index
n() - index
library(dplyr) df %>% group_by(country) %>% mutate(index = which.max(!is.na(score)), years_from_implementation = (1 - index[1]):(n() - index[1])) %>% select(-index) # country year score years_from_implementation # <chr> <dbl> <dbl> <int> # 1 US 1999 NA -4 # 2 US 2000 NA -3 # 3 US 2001 NA -2 # 4 US 2002 NA -1 # 5 US 2003 426 0 # 6 US 2004 NA 1 # 7 US 2005 NA 2 # 8 US 2006 430 3 # 9 US 2007 NA 4 #10 Mex 2000 450 0 #11 Mex 2001 NA 1
这里有一个 dplyr 选项
dplyr
library(dplyr) df %>% group_by(country) %>% mutate(years_from_implementation = 1:n() - which(score == first(score[!is.na(score)]))) %>% ungroup() ## A tibble: 11 x 4 # country year score years_from_implementation # <chr> <dbl> <dbl> <int> # 1 US 1999 NA -4 # 2 US 2000 NA -3 # 3 US 2001 NA -2 # 4 US 2002 NA -1 # 5 US 2003 426 0 # 6 US 2004 NA 1 # 7 US 2005 NA 2 # 8 US 2006 430 3 # 9 US 2007 NA 4 #10 Mex 2000 450 0 #11 Mex 2001 NA 1