How to create a lag variable within each group?

You could do this within data.table

 library(data.table) data[, lag.value:=c(NA, value[-.N]), by=groups]  data #   time groups       value   lag.value #1:    1      a  0.02779005          NA #2:    2      a  0.88029938  0.02779005 #3:    3      a -1.69514201  0.88029938 #4:    1      b -1.27560288          NA #5:    2      b -0.65976434 -1.27560288 #6:    3      b -1.37804943 -0.65976434 #7:    4      b  0.12041778 -1.37804943

For multiple columns:

nm1 <- grep("^value", colnames(data), value=TRUE)nm2 <- paste("lag", nm1, sep=".")data[, (nm2):=lapply(.SD, function(x) c(NA, x[-.N])), by=groups, .SDcols=nm1] data#    time groups      value     value1      value2  lag.value lag.value1#1:    1      b -0.6264538  0.7383247  1.12493092         NA         NA#2:    2      b  0.1836433  0.5757814 -0.04493361 -0.6264538  0.7383247#3:    3      b -0.8356286 -0.3053884 -0.01619026  0.1836433  0.5757814#4:    1      a  1.5952808  1.5117812  0.94383621         NA         NA#5:    2      a  0.3295078  0.3898432  0.82122120  1.5952808  1.5117812#6:    3      a -0.8204684 -0.6212406  0.59390132  0.3295078  0.3898432#7:    4      a  0.4874291 -2.2146999  0.91897737 -0.8204684 -0.6212406#    lag.value2#1:          NA#2:  1.12493092#3: -0.04493361#4:          NA#5:  0.94383621#6:  0.82122120#7:  0.59390132

Update

From data.table versions >= v1.9.5, we can use shift with type as lag or lead. By default, the type is lag.

data[, (nm2) :=  shift(.SD), by=groups, .SDcols=nm1]#   time groups      value     value1      value2  lag.value lag.value1#1:    1      b -0.6264538  0.7383247  1.12493092         NA         NA#2:    2      b  0.1836433  0.5757814 -0.04493361 -0.6264538  0.7383247#3:    3      b -0.8356286 -0.3053884 -0.01619026  0.1836433  0.5757814#4:    1      a  1.5952808  1.5117812  0.94383621         NA         NA#5:    2      a  0.3295078  0.3898432  0.82122120  1.5952808  1.5117812#6:    3      a -0.8204684 -0.6212406  0.59390132  0.3295078  0.3898432#7:    4      a  0.4874291 -2.2146999  0.91897737 -0.8204684 -0.6212406#    lag.value2#1:          NA#2:  1.12493092#3: -0.04493361#4:          NA#5:  0.94383621#6:  0.82122120#7:  0.59390132

If you need the reverse, use type=lead

nm3 <- paste("lead", nm1, sep=".")

Using the original dataset

  data[, (nm3) := shift(.SD, type='lead'), by = groups, .SDcols=nm1]  #  time groups      value     value1      value2 lead.value lead.value1  #1:    1      b -0.6264538  0.7383247  1.12493092  0.1836433   0.5757814  #2:    2      b  0.1836433  0.5757814 -0.04493361 -0.8356286  -0.3053884  #3:    3      b -0.8356286 -0.3053884 -0.01619026         NA          NA  #4:    1      a  1.5952808  1.5117812  0.94383621  0.3295078   0.3898432  #5:    2      a  0.3295078  0.3898432  0.82122120 -0.8204684  -0.6212406  #6:    3      a -0.8204684 -0.6212406  0.59390132  0.4874291  -2.2146999  #7:    4      a  0.4874291 -2.2146999  0.91897737         NA          NA #   lead.value2 #1: -0.04493361 #2: -0.01619026 #3:          NA #4:  0.82122120 #5:  0.59390132 #6:  0.91897737 #7:          NA

data

 set.seed(1) data <- data.table(time =c(1:3,1:4),groups = c(rep(c("b","a"),c(3,4))),             value = rnorm(7), value1=rnorm(7), value2=rnorm(7))

r data.table plyr dplyr

Using package dplyr:

library(dplyr)data <-     data %>%    group_by(groups) %>%    mutate(lag.value = dplyr::lag(value, n = 1, default = NA))

gives

> dataSource: local data table [7 x 4]Groups: groups  time groups       value   lag.value1    1      a  0.07614866          NA2    2      a -0.02784712  0.076148663    3      a  1.88612245 -0.027847124    1      b  0.26526825          NA5    2      b  1.23820506  0.265268256    3      b  0.09276648  1.238205067    4      b -0.09253594  0.09276648

As noted by @BrianD, this implicitly assumes that value is sorted by group already. If not, either sort it by group, or use the order_by argument in lag. Also note that due to an existing issue with some versions of dplyr, for safety, arguments and the namespace should be explicitly given.

r data.table plyr dplyr

In base R, this will do the job:

data$lag.value <- c(NA, data$value[-nrow(data)])data$lag.value[which(!duplicated(data$groups))] <- NA

The first line adds a string of lagged (+1) observations. The second string corrects the first entry of each group, as the lagged observation is from previous group.

Note that data is of format data.frame to not use data.table.

CodeHunter

How to create a lag variable within each group?

Update

data

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last