R: Fill empty cell with value of last non-empty cell
df <- data.frame(a = c(1:5, "", 3, "", "", "", 4), stringsAsFactors = FALSE)> df a1 12 23 34 45 56 7 38 9 10 11 4while(length(ind <- which(df$a == "")) > 0){ df$a[ind] <- df$a[ind -1]}> df a1 12 23 34 45 56 57 38 39 310 311 4
EDIT: added time profile
set.seed(1)N = 1e6df <- data.frame(a = sample(c("",1,2),size=N,replace=TRUE), stringsAsFactors = FALSE)if(df$a[1] == "") {df$a[1] <- NA}system.time( while(length(ind <- which(df$a == "")) > 0){ df$a[ind] <- df$a[ind - 1] }, gcFirst = TRUE)user system elapsed 0.89 0.00 0.88
Here fast solution using na.locf
from the zoo
package applied within data.table
. I created a new column y in the result to better visualize the effect of replacing missing values( easy to repalce x column here). Since na.locf
replaced missing values , an extra step was needed to replace all zero length values by NA
. The solution is very fast and takes less than half second in my machine for 1e6 rows.
library(data.table)library(zoo)N=1e6 ## number of rows DT <- data.table(x=sample(c("",1,2),size=N,replace=TRUE))system.time(DT[!nzchar(x),x:=NA][,y:=na.locf(x)])## user system elapsed ## 0.59 0.30 1.78 # x y# 1: 2 2# 2: NA 2# 3: NA 2# 4: 1 1# 5: 1 1# --- # 999996: 1 1# 999997: 2 2# 999998: 2 2# 999999: NA 2# 1000000: NA 2