Good Python modules for fuzzy string comparison? [closed]

python string string-comparison fuzzy-comparison

difflib can do it.

Example from the docs:

>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])['apple', 'ape']>>> import keyword>>> get_close_matches('wheel', keyword.kwlist)['while']>>> get_close_matches('apple', keyword.kwlist)[]>>> get_close_matches('accept', keyword.kwlist)['except']

Check it out. It has other functions that can help you build something custom.

python string string-comparison fuzzy-comparison

Levenshtein Python extension and C library.

https://github.com/ztane/python-Levenshtein/

The Levenshtein Python C extension module contains functions for fastcomputation of- Levenshtein (edit) distance, and edit operations- string similarity- approximate median strings, and generally string averaging- string sequence and set similarityIt supports both normal and Unicode strings.

$ pip install python-levenshtein...$ python>>> import Levenshtein>>> help(Levenshtein.ratio)ratio(...)    Compute similarity of two strings.    ratio(string1, string2)    The similarity is a number between 0 and 1, it's usually equal or    somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's    based on real minimal edit distance.    Examples:    >>> ratio('Hello world!', 'Holly grail!')    0.58333333333333337    >>> ratio('Brian', 'Jesus')    0.0>>> help(Levenshtein.distance)distance(...)    Compute absolute Levenshtein distance of two strings.    distance(string1, string2)    Examples (it's hard to spell Levenshtein correctly):    >>> distance('Levenshtein', 'Lenvinsten')    4    >>> distance('Levenshtein', 'Levensthein')    2    >>> distance('Levenshtein', 'Levenshten')    1    >>> distance('Levenshtein', 'Levenshtein')    0

python string string-comparison fuzzy-comparison

As nosklo said, use the difflib module from the Python standard library.

The difflib module can return a measure of the sequences' similarity using the ratio() method of a SequenceMatcher() object. The similarity is returned as a float in the range 0.0 to 1.0.

>>> import difflib>>> difflib.SequenceMatcher(None, 'abcde', 'abcde').ratio()1.0>>> difflib.SequenceMatcher(None, 'abcde', 'zbcde').ratio()0.80000000000000004>>> difflib.SequenceMatcher(None, 'abcde', 'zyzzy').ratio()0.0

CodeHunter

Good Python modules for fuzzy string comparison? [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last