Hive: Sum over a specified group (HiveQL)

hadoop hive hiveql hortonworks-data-platform

Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING statement.

The HiveQL query is therefore:

SELECT key, product_code,SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)FROM test;

hadoop hive hiveql hortonworks-data-platform

You could use BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to achieve that without a self join.

Code as below:

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)FROM T;

hadoop hive hiveql hortonworks-data-platform

The analytics function sum gives cumulative sums. For example, if you did:

select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test

then you would get:

key    product_code    cost     total_costs1      UK              20       201      US              10       301      EU              5        352      UK              3        32      EU              6        9

which, it seems, is not what you want.

Instead, you should use the aggregation function sum, combined with a self join to accomplish this:

select test.key, test.product_code, test.cost, agg.total_costfrom (  select key, sum(cost) as total_cost  from test  group by key) aggjoin teston agg.key = test.key;

CodeHunter

Hive: Sum over a specified group (HiveQL)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last