distributed scheduling system for R scripts

r hadoop parallel-processing distributed-computing scheduled-tasks

If what you are wanting to do is distribute jobs for parallel execution on machines you have physical access to, I HIGHLY recommend the doRedis backend for foreach. You can read the vignette PDF to get more details. The gist is as follows:

Why write a doRedis package? After all, the foreach package already has available many parallel back end packages, including doMC, doSNOW and doMPI. The doRedis package allows for dynamic pools of workers. New workers may be added at any time, even in the middle of running computations. This feature is relevant, for example, to modern cloud computing environments. Users can make an economic decision to \turn on" more computing resources at any time in order to accelerate running computations. Similarly, modernThe doRedis Package cluster resource allocation systems can dynamically schedule R workers as cluster resources become available

Hadoop works best if the machines running Hadoop are dedicated to the cluster, and not borrowed. There's also considerable overhead to setting up Hadoop which can be worth the effort if you need the map/reduce algo and distributed storage provided by Hadoop.

So what, exactly is your configuration? Do you have an office full of machines you're wanting to distribute R jobs on? Do you have a dedicated cluster? Is this going to be EC2 or other "cloud" based?

The devil is in the details, so you can get better answers if the details are explicit.

If you want the workers to do jobs and have the results of the jobs reconfigured back in one master node, you'll be much better off using a dedicated R solution and not a system like TakTuk or dsh which are more general parallelization tools.

r hadoop parallel-processing distributed-computing scheduled-tasks

Look into TakTuk and dsh as starting points. You could perhaps roll your own mechanism with pssh or clusterssh, though these may be more effort.

CodeHunter

distributed scheduling system for R scripts

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last