EMR vs EC2/Hadoop on AWS

hadoop amazon-web-services amazon-ec2 emr

EMR does a lot of things for you that you won't find on standard Hadoop on EC2. Some particularly important ones include

Copying Hadoop logs from your machines to S3. This is very useful for debugging errors after the cluster has been shut down.
Running job flows of multiple MapReduce, Pig, or Hive jobs
Setting sensible configuration defaults based on hardware size you choose
Access to spot instances for cheaper compute
Ability to resize clusters dynamically

You'll also find that the EMR S3 filesystem is faster and more reliable than the standard one packaged with Apache Hadoop. It supports Multipart upload, and streams writes directly to S3 rather than buffering to disk first. For a bit more on this, see Tip #5

Additionally, if you do decide to use EC2 directly, I'd recommend using instance-storage instead of EBS for your nodes. There's really no reason to pay the extra cost of EBS for Hadoop; you'll notice that EMR clusters all run on instance-storage nodes as well.

hadoop amazon-web-services amazon-ec2 emr

You are correct that EMR uses instance-store backed EC2 instances, rather than EBS. However, there's nothing stopping you from creating an instance-store based instance, packing an AMI and using it for your Hadoop cluster. Using EBS also might not represent a lot of additional costs, depending on your workload and frequency. Also, there's an added cost to the EC2 instance when using it through EMR.

I've been using EMR for two years now and I would highly recommend the service as you don't need to invest time in managing and updating your distribution. If your workload is compatible with EMR (getting data from DynamoDB or S3), I would go for EMR as opposed to EC2/Hadoop.

CodeHunter

EMR vs EC2/Hadoop on AWS

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last