Upload Spark RDD to REST webservice POST method Upload Spark RDD to REST webservice POST method hadoop hadoop

Upload Spark RDD to REST webservice POST method


There is no specific way to do that with Spark. With that kind of data size it will not be worth it to go through HDFS or another type of storage. You can collect that data in your driver's memory and send it directly. For a POST call you can just use plain old java.net.URL, which would look something like this:

import java.net.{URL, HttpURLConnection}// The RDD you want to sendval rdd = ???// Gather data and turn into string with newlinesval body = rdd.collect.mkString("\n")// Open a connectionval url = new URL("http://www.example.com/resource")val conn = url.openConnection.asInstanceOf[HttpURLConnection]// Configure for POST requestconn.setDoOutput(true);conn.setRequestMethod("POST");val os = conn.getOutputStream;os.write(input.getBytes);os.flush;

A much more complete discussion of using java.net.URL can be found at this question. You could also use a Scala library to handle the ugly Java stuff for you, like akka-http or Dispatch.


Spark itself does not provide this functionality (it is not a general-purpose http client).You might consider using some existing rest client library such as akka-http, spray or some other java/scala client library.

That said, you are by no means obliged to save your data to disk before operating on it. You could for example use collect() or foreach methods on your RDD in combination with your REST client library.