MySQL remove duplicates from big database quick MySQL remove duplicates from big database quick sql sql

MySQL remove duplicates from big database quick


I believe this will do it, using on duplicate key + ifnull():

create table tmp like yourtable;alter table tmp add unique (text1, text2);insert into tmp select * from yourtable     on duplicate key update text3=ifnull(text3, values(text3));rename table yourtable to deleteme, tmp to yourtable;drop table deleteme;

Should be much faster than anything that requires group by or distinct or a subquery, or even order by. This doesn't even require a filesort, which is going to kill performance on a large temporary table. Will still require a full scan over the original table, but there's no avoiding that.


Found this simple 1-line code to do exactly what I needed:

ALTER IGNORE TABLE dupTest ADD UNIQUE INDEX(a,b);

Taken from:http://mediakey.dk/~cc/mysql-remove-duplicate-entries/


DELETE FROM dupsWHERE id NOT IN(    SELECT id FROM (        SELECT DISTINCT id, text1, text2            FROM dups        GROUP BY text1, text2        ORDER BY text3 DESC    ) as tmp)

This queries all records, groups by the distinction fields and orders by ID (means we pick the first not null text3 record). Then we select the id's from that result (these are good ids...they wont be deleted) and delete all IDs that AREN'T those.

Any query like this affecting the entire table will be slow. You just need to run it and let it roll out so you can prevent it in the future.

After you have done this "fix" I would apply UNIQUE INDEX (text1, text2) to that table. To prevent the posibility of duplicates in the future.

If you want to go the "create a new table and replace the old one" route. You could use the very inner select statement to create your insert statement.

MySQL specific (assumes new table is named my_tbl2 and has exactly the same structure):

INSERT INTO my_tbl2    SELECT DISTINCT id, text1, text2, text3            FROM dups        GROUP BY text1, text2        ORDER BY text3 DESC

See MySQL INSERT ... SELECT for more information.