Pig: Get top n values per group

One approach is

records = LOAD '/user/nubes/ncdc/micro-tab/top.txt' AS (user:chararray,value:chararray,counter:int);grpd = GROUP records BY user;top3 = foreach grpd {        sorted = order records by counter desc;        top    = limit sorted 2;        generate group, flatten(top);};

Input is:

Alice   third   5 Alice   first   11 Alice   second  10Alice   fourth  2Bob second  20Bob third   18Bob first   21Bob fourth  8

Output is:

(Alice,Alice,first,11)(Alice,Alice,second,10(Bob,Bob,first,21)(Bob,Bob,second,20)

hadoop hdfs apache-pig

I have just made an observation that

top    = limit sorted 2;

top is an inbuilt function and may throw an error so the only thing which I did was changed the name of the relation in this case and instead of

generate group, flatten(top);

which was giving the output

(Alice,Alice,first,11)(Alice,Alice,second,10(Bob,Bob,first,21)(Bob,Bob,second,20)

Amended that as shown below -

records = load 'test1.txt' using PigStorage(',') as (user:chararray, value:chararray, count:int);grpd = GROUP records BY user;top2 = foreach grpd {        sorted = order records by count desc;        top1    = limit sorted 2;        generate flatten(top1);};

which gave me the desired output as required by you -

(Alice,first,11)(Alice,second,10)(Bob,first,21)(Bob,second,20)

Hope this helps.

CodeHunter

Pig: Get top n values per group

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last