How to replace NA values in a table for selected columns
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
Edit 2020-06-15
Since data.table
1.12.4 (Oct 2019), data.table
gains two functions to facilitate this: nafill
and setnafill
.
nafill
operates on columns:
cols = c('a', 'b')y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]
setnafill
operates on tables (the replacements happen by-reference/in-place)
setnafill(y, cols=cols, fill=0)# print y to show the effecty[]
This will also be more efficient than the other options; see ?nafill
for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA
imputation for time series.
This will work for your data.table
version:
for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]
Alternatively, as David Arenburg points out below, you can use set
(side benefit - you can use it either on data.frame
or data.table
):
for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
Building on @Robert McDonald's tidyr::replace_na()
answer, here are some dplyr
options for controlling which columns the NA
s are replaced:
library(tidyverse)# by column type:x %>% mutate_if(is.numeric, ~replace_na(., 0))# select columns defined in vars(col1, col2, ...):x %>% mutate_at(vars(a, b, c), ~replace_na(., 0))# all columns:x %>% mutate_all(~replace_na(., 0))