What is semi-join in database? What is semi-join in database? database database

What is semi-join in database?


Simple example. Let's select students with grades using left outer join:

SELECT DISTINCT s.idFROM  students s      LEFT JOIN grades g ON g.student_id = s.idWHERE g.student_id IS NOT NULL

Now the same with left semi-join:

SELECT s.idFROM  students sWHERE EXISTS (SELECT 1 FROM grades g              WHERE g.student_id = s.id)

The latter is much more efficient.


As far as I know SQL dialects that support SEMIJOIN/ANTISEMI are U-SQL/Cloudera Impala.

SEMIJOIN:

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.

More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:

“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.

-- INSELECT *FROM EmployeeWHERE DeptName IN (  SELECT DeptName  FROM Dept)-- EXISTSSELECT *FROM EmployeeWHERE EXISTS (  SELECT 1  FROM Dept  WHERE Employee.DeptName = Dept.DeptName)

EDIT:

Another dialect that supports SEMI/ANTISEMI join is KQL:

kind=leftsemi (or kind=rightsemi)

Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.

let t1 = datatable(key:long, value:string)  [1, "a",  2, "b",3, "c"];let t2 = datatable(key:long)[1,3];t1 | join kind=leftsemi (t2) on key

demo

Output:

key  value1    a3    c


As I understand, a semi join is a left join or right join:

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

So the difference between a left (semi) join and a "conventional" join is that you only retrieve the data of the left table (where you have a match on your join condition). Whereas with a full (outer) join (I think thats what you mean by conventional join), you retrieve the data of both tables where your condition matches.