SQL Left Join first match only
distinct
is not a function. It always operates on all columns of the select list.
Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:
select ...from ( select IDNo, FirstName, LastName, ...., row_number() over (partition by lower(idno) order by firstname) as rn from people ) twhere rn = 1;
Using the order by
clause you can select which of the duplicates you want to pick.
The above can be used in a left join, see below:
select ...from x left join ( select IDNo, FirstName, LastName, ...., row_number() over (partition by lower(idno) order by firstname) as rn from people ) p on p.idno = x.idno and p.rn = 1where ...
Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.
SELECT *FROM People pWHERE PeopleID = ( SELECT MIN(PeopleID) FROM People WHERE IDNo = p.IDNo)
Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.
SELECT DISTINCT P2.*FROM ( SELECT IDNo , FirstName , LastName FROM people P) P2
Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3
CREATE TABLE people( [entry] int , [IDNo] varchar(3) , [FirstName] varchar(5) , [LastName] varchar(7));INSERT INTO people (entry,[IDNo], [FirstName], [LastName])VALUES (1,'uqx', 'bob', 'smith'), (2,'abc', 'john', 'willis'), (3,'ABC', 'john', 'willis'), (4,'aBc', 'john', 'willis'), (5,'WTF', 'jeff', 'bridges'), (6,'Sss', 'bill', 'doe'), (7,'sSs', 'bill', 'doe'), (8,'ssS', 'bill', 'doe'), (9,'ere', 'sally', 'abby'), (10,'wtf', 'jeff', 'bridges');