Submit jobs to a slave node from within an R script? Submit jobs to a slave node from within an R script? r r

Submit jobs to a slave node from within an R script?


If you want to submit jobs from within an R script, I suggest that you look at the "BatchJobs" package. Here is a quote from the DESCRIPTION file:

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine.

BatchJobs appears to be more sophisticated than previous, similar packages, such as Rsge and Rlsf. There are functions for registering, submitting, and retrieving the results of jobs. Here's a simple example:

library(BatchJobs)reg <- makeRegistry(id='test')batchMap(reg, sqrt, x=1:10)submitJobs(reg)y <- loadResults(reg)

You need to configure BatchJobs to use your batch queueing system. The submitJobs "resource" argument can be used to request appropriate resources for the jobs.

This approach is very useful if your cluster doesn't allow very long running jobs, or if it severely restricts the number of long running jobs. BatchJobs allows you to get around those restrictions by breaking up your work into multiple jobs while hiding most of the work associated with doing that manually.

Documentation and examples are available at the project website.


For most of our work we do run multiple R sessions in parallel using qsub (instead).

If it is for multiple files I normally do:

while read infile restdoqsub -v infile=$infile call_r.sh done < list_of_infiles.txt

call_r.sh:

...R --vanilla -f analyse_file.R $infile...

analyse_file.R:

args <- commandArgs()infile=args[5]outfile=paste(infile,".out",sep="")...

Then I combine all the output afterwards...


The R package Rsge allows job submission to SGE managed clusters. It basically saves the required environment to disk, builds job submission scripts, executes them via qsub and then collates the results and returns them to you.

Because it basically wraps calls to qsub, it should work with PBS too (although since I don't know PBS, I can't guarantee it). You can alter the qsub command and the options used by altering the Rsge associated global options (prefixed sge. in the options() output)

Its is no longer on CRAN, but it is availible from github: https://github.com/bodepd/Rsge, although it doesn't look like its maintained any more.

To use it use one of the apply type functions supplied with the package: sge.apply , sge.parRapply, sge.parCapply, sge.parLapply and sge.parSapply, which are parallel equivalents to apply, rapply, rapply(t(x),…), lapply and sapply respectively. In addition to the standard parameters passed to the non-parallel functions a couple of other parameters are needed:

njobs:             Number of parallel jobs to useglobal.savelist:   Character vector giving the names of variables                   from  the global environment that should be imported.function.savelist: Character vector giving the variables to save from                   the local environment.packages:          List of library packages to be loaded by each worker process                   before computation is started.

The two savelist parameters and the packages parameters basically specify what variables, functions and packages should be loaded into the new instances of R running on the cluster machines before your code is executed. The different components of X (either list items or data.frame rows/columns) are divided between njobs different jobs and submitted as a job array to SGE. Each node starts an instance of R loads the specified variables, functions and packages, executes the code, saves and save the results to a tmp file. The controlling R instance checks when the jobs are complete, loads the data from the tmp files and joins the results back together to get the final results.

For example computing a statistic on a random sample of a gene list:

library(Rsge)library(some.bioc.library)gene.list <- read.delim(“gene.list.tsv”)compute.sample <- function(gene.list) {   gene.list.sample <- sample(1000, gene.list)   statistic <- some.slow.bioc.function(gene.list.sample)   return (statistic)}results <- sge.parSapply(1:10000, function(x) compute.sample,                         njobs = 100,                         global.savelist = c(“gene.list”),                         function.savelist(“compute.sample”),                         packages = c(“some.bioc.library”))