Simple working example of ddply() in parallel on Windows Simple working example of ddply() in parallel on Windows r r

Simple working example of ddply() in parallel on Windows


Here's a simple working example:

> df <- data.frame(val=1:10, ind=c(rep(2, 5), rep(3, 5)))> library(doSNOW)> registerDoSNOW(makeCluster(2, type = "SOCK"))> system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=FALSE)))  ind V11   2 252   3 55   user  system elapsed    0.00    0.00    4.01 > system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=TRUE)))  ind V11   2 252   3 55   user  system elapsed    0.02    0.00    2.02 


Have you registered a parallel backend to foreach ?

You may need to read up on use of foreach before you use it with plyr.


A. I've been communicating with Hadley and there are no plans in the immediate future to fix this bug. The fix itself can be attempted by anyone. Here are some tips I received from Hadley:

"It's relatively easy at the simplest level - you just need to pass a.export argument to foreach. Ideally, plyr would figure out what toexport automatically, but in the mean time, modifying .parallel totake a list of arguments to foreach (instead of just T/F) would be abig step. Start with llply, and if you can get that working, it'sfairly trivial to get all the other functions working too."

B. I highly recommend snow and doSNOW to get parallel foreach to work on Windows. The other parallel backends either: 1. don't support Windows 2. don't work on 64-bit Windows 3. are supposed to work on Windows but are too buggy.snow/doSNOW was the the only solution that worked "out-of-the-box"

C. good luck!