Where do I run spark - Standalone, Hadoop or Mesos

hadoop apache-spark hadoop-yarn mesos

Depending on the details of your use case, you may see performance go up and down in any given configuration compared to another. However Hadoop and Mesos give you other advantages than performance. There are many in each case but for example:

Hadoop

HDFS as a resilient, distributed file store.
Access data sets using metadata existing in Hadoop, such as through HiveContext
Mix Spark processing with other methods such as Map-Reduce
YARN as a resource manager to assign resources to your tasks

Mesos - Mesos is more focussed on a specific role than Hadoop, namely managing resources across a cluster of machines. However it does this across a range of Workload types. These could be data processing jobs such as Spark, distributed applications in Akka, distributed database etc. It can move tasks to other machines if a one machine fails.

I recommend watching this video, I was lucky enough to attend this meetup live:https://www.youtube.com/watch?v=gzx4-6RB7Yw

It demonstrates the use of Spark, HDFS, Mesos and Docker to do distributed computing on a cluster of Amazon cloud machines.

CodeHunter

Where do I run spark - Standalone, Hadoop or Mesos

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last