Hive: Sum over a specified group (HiveQL) Hive: Sum over a specified group (HiveQL) hadoop hadoop

Hive: Sum over a specified group (HiveQL)


Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING statement.

The HiveQL query is therefore:

SELECT key, product_code,SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)FROM test;


You could use BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to achieve that without a self join.

Code as below:

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)FROM T;


The analytics function sum gives cumulative sums. For example, if you did:

select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test

then you would get:

key    product_code    cost     total_costs1      UK              20       201      US              10       301      EU              5        352      UK              3        32      EU              6        9

which, it seems, is not what you want.

Instead, you should use the aggregation function sum, combined with a self join to accomplish this:

select test.key, test.product_code, test.cost, agg.total_costfrom (  select key, sum(cost) as total_cost  from test  group by key) aggjoin teston agg.key = test.key;