Hadoop API VS. Hadoop Streaming

hadoop mapreduce cloudera

Usually we have Map/Reduce pair written in java..a map which splits the dataset into independent chunks, and a reduce which combines the results to perform some useful analysis...Hadoop streaming is a utility which allows us to write Map/Reduce applications in any language(like Ruby/Python/Bash etc.) that is capable of working with STDIN(for input) and STDOUT(for output)!

hadoop mapreduce cloudera

You're right to say that if you don't use Java you will not have the core hadoop functions available. THings like ChainMapper and ChainReducer, ChainedJobs and such are not available via streaming. Also, as Hadoop is written in Java, using Java will make it faster.

Also, another thing, theoretically, no reducer starts after the mapper is done. What you might see in the HTML as reducers running at the same time it's input being moved around.

hadoop mapreduce cloudera

Hadoop Streaming enables us to write map and reduce functions in any programming or scripting language that supports reading data from standard input and writing to standard output. This feature makes Hadoop Streaming very flexible and can be easily used by a large number of users. R, Python, C++ , or pretty much any other language. There are a lot of parameters that can be customized, for example, number of mappers, number of reducers, jvm memory, input format, output format etc. The default input format for hadoop streaming jobs is TextInputFormat, which reads the data one line at a time.

Hadoop APIPretty much binds you to Java, but the configuration and development is more straightforward since everything can be configured from the Java code itself. From my experience Java seems to be slightly faster, but streaming can get pretty close when properly configured and used with the right language.

CodeHunter

Hadoop API VS. Hadoop Streaming

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last