PIG how to count a number of rows in alias PIG how to count a number of rows in alias hadoop hadoop

PIG how to count a number of rows in alias


COUNT is part of pig see the manual

LOGS= LOAD 'log';LOGS_GROUP= GROUP LOGS ALL;LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(LOGS);


Arnon Rotem-Gal-Oz already answered this question a while ago, but I thought some may like this slightly more concise version.

LOGS = LOAD 'log';LOG_COUNT = FOREACH (GROUP LOGS ALL) GENERATE COUNT(LOGS);


Be careful, with COUNT your first item in the bag must not be null. Else you can use the function COUNT_STAR to count all rows.