Remove duplicate rows in MySQL Remove duplicate rows in MySQL sql sql

Remove duplicate rows in MySQL


A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:

ALTER IGNORE TABLE jobsADD UNIQUE INDEX idx_name (site_id, title, company);

This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...


If you don't want to alter the column properties, then you can use the query below.

Since you have a column which has unique IDs (e.g., auto_increment columns), you can use it to remove the duplicates:

DELETE `a`FROM    `jobs` AS `a`,    `jobs` AS `b`WHERE    -- IMPORTANT: Ensures one version remains    -- Change "ID" to your unique column's name    `a`.`ID` < `b`.`ID`    -- Any duplicates you want to check for    AND (`a`.`title` = `b`.`title` OR `a`.`title` IS NULL AND `b`.`title` IS NULL)    AND (`a`.`company` = `b`.`company` OR `a`.`company` IS NULL AND `b`.`company` IS NULL)    AND (`a`.`site_id` = `b`.`site_id` OR `a`.`site_id` IS NULL AND `b`.`site_id` IS NULL);

In MySQL, you can simplify it even more with the NULL-safe equal operator (aka "spaceship operator"):

DELETE `a`FROM    `jobs` AS `a`,    `jobs` AS `b`WHERE    -- IMPORTANT: Ensures one version remains    -- Change "ID" to your unique column's name    `a`.`ID` < `b`.`ID`    -- Any duplicates you want to check for    AND `a`.`title` <=> `b`.`title`    AND `a`.`company` <=> `b`.`company`    AND `a`.`site_id` <=> `b`.`site_id`;


MySQL has restrictions about referring to the table you are deleting from. You can work around that with a temporary table, like:

create temporary table tmpTable (id int);insert  into tmpTable        (id)select  idfrom    YourTable ytwhere   exists        (        select  *        from    YourTabe yt2        where   yt2.title = yt.title                and yt2.company = yt.company                and yt2.site_id = yt.site_id                and yt2.id > yt.id        );delete  from    YourTablewhere   ID in (select id from tmpTable);

From Kostanos' suggestion in the comments:
The only slow query above is DELETE, for cases where you have a very large database. This query could be faster:

DELETE FROM YourTable USING YourTable, tmpTable WHERE YourTable.id=tmpTable.id