R: Fill empty cell with value of last non-empty cell R: Fill empty cell with value of last non-empty cell r r

R: Fill empty cell with value of last non-empty cell


df <- data.frame(a = c(1:5, "", 3, "", "", "", 4), stringsAsFactors = FALSE)> df   a1  12  23  34  45  56   7  38   9   10  11 4while(length(ind <- which(df$a == "")) > 0){  df$a[ind] <- df$a[ind -1]}> df   a1  12  23  34  45  56  57  38  39  310 311 4

EDIT: added time profile

set.seed(1)N = 1e6df <- data.frame(a = sample(c("",1,2),size=N,replace=TRUE),                 stringsAsFactors = FALSE)if(df$a[1] == "") {df$a[1] <- NA}system.time(  while(length(ind <- which(df$a == "")) > 0){    df$a[ind] <- df$a[ind - 1]  }, gcFirst = TRUE)user  system elapsed 0.89    0.00    0.88 


Here fast solution using na.locf from the zoo package applied within data.table. I created a new column y in the result to better visualize the effect of replacing missing values( easy to repalce x column here). Since na.locf replaced missing values , an extra step was needed to replace all zero length values by NA. The solution is very fast and takes less than half second in my machine for 1e6 rows.

library(data.table)library(zoo)N=1e6  ##  number of rows DT <- data.table(x=sample(c("",1,2),size=N,replace=TRUE))system.time(DT[!nzchar(x),x:=NA][,y:=na.locf(x)])## user  system elapsed ## 0.59    0.30    1.78 # x y# 1:  2 2# 2: NA 2# 3: NA 2# 4:  1 1# 5:  1 1# ---     #   999996:  1 1# 999997:  2 2# 999998:  2 2# 999999: NA 2# 1000000: NA 2


Borrowing agstudy's MWE:

library(dplyr)library(zoo)N = 1e6df <- data.frame(x = sample(c(NA,"A","B"), size=N, replace=TRUE))system.time(test <- df %>% dplyr::do(zoo::na.locf(.)))
   user  system elapsed   0.082   0.000   0.130