Join Tables on Date Range in Hive

Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.

You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.

sql hadoop hive left-join

If your situation allows, do it in two queries.

First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.

Ex:

create table tableC asselect a.*, b.skill_group     from tableA a     ,    tableB b     where a.employee_id= b.employee_id       and a.cal_date >= b.date_start       and a.cal_date <= b.date_end;with c as (select * from TableC)insert into tableCselect a.*, cast(null as string) as skill_groupfrom tableA a   left join c    on (a.employee_id= c.employee_id     and a.cal_date  = c.cal_date)where c.employee_id is null ;

sql hadoop hive left-join

MarkWusinich had a great solution but with one major issue. If table a has an employee ID twice within the date range table c will also have that employee_ID twice (if b was unique if not more) creating 4 records after the join. As such if A is not unique on employee_ID a group by will be necessary. Corrected below:

with C as(select a.employee_id, b.skill_group     from tableA a     ,    tableB b     where a.employee_id= b.employee_id       and a.cal_date >= b.date_start       and a.cal_date <= b.date_endgroup by a.employee_id, b.skill_group) Cselect a.*, c.skill_groupfrom tableA a left join c  on a.employee_id = c.employee_id     and a.cal_date  = c.cal_date;

Please note: If B was somehow intentionally not distinct on (employee_id, skill_group), then my query above would also have to be modified to appropriately reflect that.

CodeHunter

Join Tables on Date Range in Hive

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last