R - Parallelizing multiple model learning (with dplyr and purrr) R - Parallelizing multiple model learning (with dplyr and purrr) r r

R - Parallelizing multiple model learning (with dplyr and purrr)


Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:

library(dplyr)library(multidplyr)library(purrr)cluster <- create_cluster(4)set_default_cluster(cluster)cluster_library(cluster, "fitdistrplus")# dt is a dataframe, subject_id identifies observations from each subjectby_subject <- partition(dt, subject_id)fits <- by_subject %>%     do(fit = fitdist(.$observation, "norm")))collected_fits <- collect(fits)$fitcollected_summaries <- collected_fits %>% map(summary)


There is the furrr package now, for example something like:

library(dplyr)library(furrr)plan(multiprocess)dt %>%     split(dt$subject_id) %>%    future_map(~fitdist(.$observation, "norm"))