What is semi-join in database?
Simple example. Let's select students with grades using left outer join:
SELECT DISTINCT s.idFROM students s LEFT JOIN grades g ON g.student_id = s.idWHERE g.student_id IS NOT NULL
Now the same with left semi-join:
SELECT s.idFROM students sWHERE EXISTS (SELECT 1 FROM grades g WHERE g.student_id = s.id)
The latter is much more efficient.
As far as I know SQL dialects that support SEMIJOIN/ANTISEMI
are U-SQL/Cloudera Impala.
Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.
More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:
“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.
-- INSELECT *FROM EmployeeWHERE DeptName IN ( SELECT DeptName FROM Dept)-- EXISTSSELECT *FROM EmployeeWHERE EXISTS ( SELECT 1 FROM Dept WHERE Employee.DeptName = Dept.DeptName)
EDIT:
Another dialect that supports SEMI/ANTISEMI join is KQL:
kind=leftsemi (or kind=rightsemi)
Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.
let t1 = datatable(key:long, value:string) [1, "a", 2, "b",3, "c"];let t2 = datatable(key:long)[1,3];t1 | join kind=leftsemi (t2) on key
Output:
key value1 a3 c
As I understand, a semi join is a left join or right join:
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
So the difference between a left (semi) join and a "conventional" join is that you only retrieve the data of the left table (where you have a match on your join condition). Whereas with a full (outer) join (I think thats what you mean by conventional join), you retrieve the data of both tables where your condition matches.