Find the similarity metric between two strings Find the similarity metric between two strings python python

Find the similarity metric between two strings


There is a built in.

from difflib import SequenceMatcherdef similar(a, b):    return SequenceMatcher(None, a, b).ratio()

Using it:

>>> similar("Apple","Appel")0.8>>> similar("Apple","Mango")0.0


I think maybe you are looking for an algorithm describing the distance between strings. Here are some you may refer to:

  1. Hamming distance
  2. Levenshtein distance
  3. Damerau–Levenshtein distance
  4. Jaro–Winkler distance


Solution #1: Python builtin

use SequenceMatcher from difflib

pros: native python library, no need extra package.
cons: too limited, there are so many other good algorithms for string similarity out there.

example :
>>> from difflib import SequenceMatcher>>> s = SequenceMatcher(None, "abcd", "bcde")>>> s.ratio()0.75

Solution #2: jellyfish library

its a very good library with good coverage and few issues.it supports:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance

pros: easy to use, gamut of supported algorithms, tested.
cons: not native library.

example:

>>> import jellyfish>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')2>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')0.89629629629629637>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')1