Hive: Sum over a specified group (HiveQL)
Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
statement.
The HiveQL query is therefore:
SELECT key, product_code,SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)FROM test;
You could use BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
to achieve that without a self join.
Code as below:
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)FROM T;
The analytics function sum gives cumulative sums. For example, if you did:
select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test
then you would get:
key product_code cost total_costs1 UK 20 201 US 10 301 EU 5 352 UK 3 32 EU 6 9
which, it seems, is not what you want.
Instead, you should use the aggregation function sum, combined with a self join to accomplish this:
select test.key, test.product_code, test.cost, agg.total_costfrom ( select key, sum(cost) as total_cost from test group by key) aggjoin teston agg.key = test.key;