How do I parallelize in r on windows - example? How do I parallelize in r on windows - example? r r

How do I parallelize in r on windows - example?


Posting this because this took me bloody forever to figure out. Here's a simple example of parallelization in r that will let you test if things are working right for you and get you on the right path.

library(snow)z=vector('list',4)z=1:4system.time(lapply(z,function(x) Sys.sleep(1)))cl<-makeCluster(###YOUR NUMBER OF CORES GOES HERE ###,type="SOCK")system.time(clusterApply(cl, z,function(x) Sys.sleep(1)))stopCluster(cl)

You should also use library doSNOW to register foreach to the snow cluster, this will cause many packages to parallelize automatically. The command to register is registerDoSNOW(cl) (with cl being the return value from makeCluster()) , the command that undoes registration is registerDoSEQ(). Don't forget to turn off your clusters.


This worked for me, I used package doParallel, required 3 lines of code:

# process in parallellibrary(doParallel) cl <- makeCluster(detectCores(), type='PSOCK')registerDoParallel(cl)# turn parallel processing off and run sequentially again:registerDoSEQ()

Calculation of a random forest decreased from 180 secs to 120 secs (on a Windows computer with 4 cores).


Based on the information here I was able to convert the following code into a parallelised version that worked under R Studio on Windows 7.

Original code:

## Basic elbow plot function#wssplot <- function(data, nc=20, seed=1234){    wss <- (nrow(data)-1)*sum(apply(data,2,var))    for (i in 2:nc){        set.seed(seed)        wss[i] <- sum(kmeans(data, centers=i, iter.max=30)$withinss)}    plot(1:nc, wss, type="b", xlab="Number of clusters",        ylab="Within groups sum of squares")}

Parallelised code:

library("parallel")workerFunc <- function(nc) {  set.seed(1234)  return(sum(kmeans(my_data_frame, centers=nc, iter.max=30)$withinss)) }num_cores <- detectCores()cl <- makeCluster(num_cores)clusterExport(cl, varlist=c("my_data_frame")) values <- 1:20 # this represents the "nc" variable in the wssplot functionsystem.time(  result <- parLapply(cl, values, workerFunc) )  # paralel execution, with time wrapperstopCluster(cl)plot(values, unlist(result), type="b", xlab="Number of clusters", ylab="Within groups sum of squares")

Not suggesting it's perfect or even best, just a beginner demonstrating that parallel does seem to work under Windows. Hope it helps.