Good Python modules for fuzzy string comparison? [closed]
difflib can do it.
Example from the docs:
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])['apple', 'ape']>>> import keyword>>> get_close_matches('wheel', keyword.kwlist)['while']>>> get_close_matches('apple', keyword.kwlist)[]>>> get_close_matches('accept', keyword.kwlist)['except']
Check it out. It has other functions that can help you build something custom.
Levenshtein Python extension and C library.
https://github.com/ztane/python-Levenshtein/
The Levenshtein Python C extension module contains functions for fastcomputation of- Levenshtein (edit) distance, and edit operations- string similarity- approximate median strings, and generally string averaging- string sequence and set similarityIt supports both normal and Unicode strings.
$ pip install python-levenshtein...$ python>>> import Levenshtein>>> help(Levenshtein.ratio)ratio(...) Compute similarity of two strings. ratio(string1, string2) The similarity is a number between 0 and 1, it's usually equal or somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's based on real minimal edit distance. Examples: >>> ratio('Hello world!', 'Holly grail!') 0.58333333333333337 >>> ratio('Brian', 'Jesus') 0.0>>> help(Levenshtein.distance)distance(...) Compute absolute Levenshtein distance of two strings. distance(string1, string2) Examples (it's hard to spell Levenshtein correctly): >>> distance('Levenshtein', 'Lenvinsten') 4 >>> distance('Levenshtein', 'Levensthein') 2 >>> distance('Levenshtein', 'Levenshten') 1 >>> distance('Levenshtein', 'Levenshtein') 0
As nosklo said, use the difflib module from the Python standard library.
The difflib module can return a measure of the sequences' similarity using the ratio()
method of a SequenceMatcher() object. The similarity is returned as a float in the range 0.0 to 1.0.
>>> import difflib>>> difflib.SequenceMatcher(None, 'abcde', 'abcde').ratio()1.0>>> difflib.SequenceMatcher(None, 'abcde', 'zbcde').ratio()0.80000000000000004>>> difflib.SequenceMatcher(None, 'abcde', 'zyzzy').ratio()0.0