Eliminating duplicate values based on only one column of the table

sql sql-server distinct inner-join duplicate-removal

This is where the window function row_number() comes in handy:

SELECT s.siteName, s.siteIP, h.dateFROM sites s INNER JOIN     (select h.*, row_number() over (partition by siteName order by date desc) as seqnum      from history h     ) h    ON s.siteName = h.siteName and seqnum = 1ORDER BY s.siteName, h.date

sql sql-server distinct inner-join duplicate-removal

From your example it seems reasonable to assume that the siteIP column is determined by the siteName column (that is, each site has only one siteIP). If this is indeed the case, then there is a simple solution using group by:

select  sites.siteName,  sites.siteIP,  max(history.date)from sitesinner join history on  sites.siteName=history.siteNamegroup by  sites.siteName,  sites.siteIPorder by  sites.siteName;

However, if my assumption is not correct (that is, it is possible for a site to have multiple siteIP), then it is not clear from you question which siteIP you want the query to return in the second column. If just any siteIP, then the following query will do:

select  sites.siteName,  min(sites.siteIP),  max(history.date)from sitesinner join history on  sites.siteName=history.siteNamegroup by  sites.siteNameorder by  sites.siteName;

sql sql-server distinct inner-join duplicate-removal

I solve such queries using this pattern:

SELECT *FROM tWHERE t.field=(  SELECT MAX(t.field)   FROM t AS t0   WHERE t.group_column1=t0.group_column1    AND t.group_column2=t0.group_column2 ...)

That is it will select records where the value of a field is at its max value. To apply it to your query I used the common table expression so that I don't have to repeat the JOIN twice:

WITH site_history AS (  SELECT sites.siteName, sites.siteIP, history.date  FROM sites  JOIN history USING (siteName))SELECT *FROM site_history hWHERE date=(  SELECT MAX(date)   FROM site_history h0   WHERE h.siteName=h0.siteName)ORDER BY siteName

It's important to note that it works only if the field we're calculating the maximum for is unique. In your example the date field should be unique for each siteName, that is if the IP can't be changed multiple times per millisecond. In my experience this is commonly the case otherwise you don't know which record is the newest anyway. If the history table has an unique index for (site, date), this query is also very fast, index range scan on the history table scanning just the first item can be used.

CodeHunter

Eliminating duplicate values based on only one column of the table

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last