Hive QL - Limiting number of rows per each item Hive QL - Limiting number of rows per each item hadoop hadoop

Hive QL - Limiting number of rows per each item


Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:

SELECT a_id, b, c, count(*) as sumrequestsFROM (    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row    FROM table_name    ) rsWHERE row <= 10000AND a_id in (1, 2, 3)GROUP BY a_id, b, c;

This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.