Where do I run spark - Standalone, Hadoop or Mesos Where do I run spark - Standalone, Hadoop or Mesos hadoop hadoop

Where do I run spark - Standalone, Hadoop or Mesos


Depending on the details of your use case, you may see performance go up and down in any given configuration compared to another. However Hadoop and Mesos give you other advantages than performance. There are many in each case but for example:

Hadoop

  • HDFS as a resilient, distributed file store.
  • Access data sets using metadata existing in Hadoop, such as through HiveContext
  • Mix Spark processing with other methods such as Map-Reduce
  • YARN as a resource manager to assign resources to your tasks

Mesos - Mesos is more focussed on a specific role than Hadoop, namely managing resources across a cluster of machines. However it does this across a range of Workload types. These could be data processing jobs such as Spark, distributed applications in Akka, distributed database etc. It can move tasks to other machines if a one machine fails.

I recommend watching this video, I was lucky enough to attend this meetup live:https://www.youtube.com/watch?v=gzx4-6RB7Yw

It demonstrates the use of Spark, HDFS, Mesos and Docker to do distributed computing on a cluster of Amazon cloud machines.