Hive QL - Limiting number of rows per each item
Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:
SELECT a_id, b, c, count(*) as sumrequestsFROM ( SELECT a_id, b, c, row_number() over (Partition BY a_id) as row FROM table_name ) rsWHERE row <= 10000AND a_id in (1, 2, 3)GROUP BY a_id, b, c;
This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.