What's the best way to dedupe a table? What's the best way to dedupe a table? sql sql

What's the best way to dedupe a table?


Using analytic function row_number:

WITH CTE (col1, col2, dupcnt)AS(SELECT col1, col2,ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS dupcntFROM Youtable)DELETEFROM CTEWHERE dupcnt > 1GO                                                                 


SELECT DISTINCT <insert all columns but the PK here> FROM foo. Create a temp table using that query (the syntax varies by RDBMS but there's typically a SELECT … INTO or CREATE TABLE AS pattern available), then blow away the old table and pump the data from the temp table back into it.


Adding the actual code here for future reference

So, there are 3 steps, and therefore 3 SQL statements:

Step 1: Move the non duplicates (unique tuples) into a temporary table

CREATE TABLE new_table asSELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];

Step 2: delete the old table (or rename it)We no longer need the table with all the duplicate entries, so drop it!

DROP TABLE old_table;

Step 3: rename the new_table to the name of the old_table

RENAME TABLE new_table TO old_table;

And of course, don't forget to fix your buggy code to stop inserting duplicates!