Find the similarity metric between two strings
There is a built in.
from difflib import SequenceMatcherdef similar(a, b): return SequenceMatcher(None, a, b).ratio()
Using it:
>>> similar("Apple","Appel")0.8>>> similar("Apple","Mango")0.0
I think maybe you are looking for an algorithm describing the distance between strings. Here are some you may refer to:
Solution #1: Python builtin
use SequenceMatcher from difflib
pros: native python library, no need extra package.
cons: too limited, there are so many other good algorithms for string similarity out there.
>>> from difflib import SequenceMatcher>>> s = SequenceMatcher(None, "abcd", "bcde")>>> s.ratio()0.75
Solution #2: jellyfish library
its a very good library with good coverage and few issues.it supports:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance
pros: easy to use, gamut of supported algorithms, tested.
cons: not native library.
example:
>>> import jellyfish>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')2>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')0.89629629629629637>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')1