Replace NA with previous or next value, by group, using dplyr
require(tidyverse) #fill is part of tidyrps1 %>% group_by(userID) %>% fill(color, age, gender) %>% #default direction down fill(color, age, gender, .direction = "up")
Which gives you:
Source: local data frame [9 x 4]Groups: userID [3] userID color age gender <dbl> <fctr> <fctr> <fctr>1 21 blue 3yrs F2 21 blue 2yrs F3 21 red 2yrs M4 22 blue 3yrs F5 22 blue 3yrs F6 22 blue 3yrs F7 23 red 4yrs F8 23 red 4yrs F9 23 gold 4yrs F
Using zoo::na.locf
directly on the whole data.frame would fill the NA regardless of the userID
groups. Package dplyr's grouping has unfortunately no effect on na.locf
function, that's why I went with a split:
library(dplyr); library(zoo)ps1 %>% split(ps1$userID) %>% lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>% do.call(rbind, .)#### userID color age gender#### 21.1 21 blue 3yrs F#### 21.2 21 blue 2yrs F#### 21.3 21 red 2yrs M#### 22.4 22 blue 3yrs F#### 22.5 22 blue 3yrs F#### 22.6 22 blue 3yrs F#### 23.7 23 red 4yrs F#### 23.8 23 red 4yrs F#### 23.9 23 gold 4yrs F
What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply
, and eventually use rbind
to bring the data.frames back together. You have the expected output.
Using @agenis method with na.locf()
combined with purrr
, you could do:
library(purrr)library(zoo)ps1 %>% slice_rows("userID") %>% by_slice(function(x) { na.locf(na.locf(x), fromLast=T) }, .collate = "rows")