Hadoop - Analyze log file (Java)

java logging hadoop

I recommend getting the raw output sum as you are doing as the result of the first pass of one Hadoop job, so at the end of the Hadoop job, you have a result like this:

User1234     Prdsum: 58    User45687    Prdsum: 0

and then have a second Hadoop job (or standalone job) that compares the various values and produces another report.

Do you need "state" as part of the first Hadoop job? If so, then you will need to keep a HashMap or HashTable in your mapper or reducer that stores the values of all the keys (users in this case) to compare - but that is not a good setup, IMHO. You are better off just doing an aggregate in one Hadoop job, and doing the comparison in another.

java logging hadoop

One way to achieve is by using a composite key.Mapper output Key is combination of userid, event id (reminder -> 0, order -> 1). Partition data using userid and you need to write your own comparator.here is the gist.

Mapper

for every event check the event type     if event type is "reminder"        emit : <User1234,0> <reminder id>    if event type is "order"        split if you have multiple orders        for every order            emit : <User1234,1> <prd, count* amount, other interested blah>

Partition using userid so all entries with same user is will go to same reducer.

Reducer

At reducer all entries will be grouped by userid and sorted event id (i.e first you will get all reminders for a given userid and followed by orders).

If `eventid` is 0    add reminders id to a set (`reminderSet`).If `eventid` is is 1 &&  prd is in `remindersSet`    emit : `<userid>  <prdsum>`else    emit : `<userid>  <0>`

More details on Composite key can be found in 'Hadoop definitive guide' or here

CodeHunter

Hadoop - Analyze log file (Java)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last