How to retrieve the most repeated value in a column present in a data frame
Another way with the data.table package, which is faster for large data sets:
set.seed(1)x=sample(seq(1,100), 5000000, replace = TRUE)
method 1 (solution proposed above)
start.time <- Sys.time()tt <- table(x)names(tt[tt==max(tt)])end.time <- Sys.time()time.taken <- end.time - start.timetime.taken
Time difference of 4.883488 secs
method 2 (DATA TABLE)
start.time <- Sys.time()ds <- data.table( x )setkey(ds, x)sorted <- ds[,.N,by=list(x)]most_repeated_value <- sorted[order(-N)]$x[1]most_repeated_valueend.time <- Sys.time()time.taken <- end.time - start.timetime.taken
Time difference of 0.328033 secs