Good mysql query to find similar values in a single column Good mysql query to find similar values in a single column sql sql

Good mysql query to find similar values in a single column


I think this can be solved by measuring the distance between strings with some string metric.

Levenshtein seems to be the most well known metric and I have used some implementation of it in Oracle. It is implemented for MySQL also. You might find some other metric that will work better for you.


Not sure this is the best way or most efficient, and it definitely depends on the meaning of similar. If the meaning is the title contains all of the text in one row but some of the text in another row, then something like this should work:

SELECT DISTINCT T.TitleFROM YourTable T   LEFT JOIN YourTable T2 ON T.Title != T2.TitleWHERE T.Title LIKE CONCAT('%', T2.Title, '%')UNION SELECT DISTINCT T2.TitleFROM YourTable T   LEFT JOIN YourTable T2 ON T.Title != T2.TitleWHERE T.Title LIKE CONCAT('%', T2.Title, '%')ORDER BY Title

And here is the SQL Fiddle.