How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez?

To answer your first question on converting MapReduce jobs to Tez DAGs:

Any MapReduce job can be thought of a single DAG with 2 vertices(stages). The first vertex is the Map phase and it is connected to a downstream vertex Reduce via a Shuffle edge.

There are 2 ways in which MR jobs can be run on Tez:

One approach is to write a native 2-stage DAG using the Tez APIs directly. This is what is currently present in tez-examples.
The second is to use the MapReduce APIs themselves and use the yarn-tez mode. In this scenario, there is a layer which intercepts the MR Job submission and instead of using MR, it translates the MR job into a 2-stage Tez DAG and executes the DAG on the Tez runtime.

For the data handling related questions that you have:

The user provides the logic on understanding the data to be read and how to split it. Tez then takes each split of data and takes over the responsibility of assigning a split or a set of splits to a given task.

The Tez framework then controls the generation and movement of data i.e. where to generate the data between intermediate steps and how to move data between 2 vertices/stages. However, it does not control the underlying data contents/structure, partitioning or serialization logic which is provided by user plugins.

The above is just a high level view with additional intricacies. You will get more detailed answers by posting specific questions to the Development list ( http://tez.apache.org/mail-lists.html )

CodeHunter

How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last