Hadoop or Hadoop Streaming for MapReduce on AWS

streaming amazon-web-services hadoop mapreduce

You have a few options for running Hadoop on AWS. The simplest is to run your MapReduce jobs via their Elastic MapReduce service: http://aws.amazon.com/elasticmapreduce. You could also run a Hadoop cluster on EC2, as described at http://archive.cloudera.com/docs/ec2.html.

If you suspect you'll need to write your own input/output formats, partitioners, and combiners, I'd recommend using Java with the latter system. If your job is relatively simple and you don't plan to use your Hadoop cluster for any other purpose, I'd recommend choosing the language with which you are most comfortable and using EMR.

Either way, good luck!

Disclosure: I am a founder of Cloudera.

Regards,Jeff

streaming amazon-web-services hadoop mapreduce

I decided the flexibility of Java was more important than dealing with the possible shortcomings of adjusting my current code from C++ to Java.

Thanks for all your answers.

streaming amazon-web-services hadoop mapreduce

It depends on your needs.What is your input/output? Is it a simple text files? Records with new line delimiters?Do you need a special combiner? partitioner?

What i mean is, that if you need only the hadoop basics, than streaming will be fine.But if you need a little more complexity (from the hadoop framework, not from your own business logic), hadoop jar will be more flexible.

Sagie

CodeHunter

Hadoop or Hadoop Streaming for MapReduce on AWS

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last