How to use SequenceMatcher to find similarity between two strings?

python difflib

You forgot the first parameter to SequenceMatcher.

>>> import difflib>>> >>> a='abcd'>>> b='ab123'>>> seq=difflib.SequenceMatcher(None, a,b)>>> d=seq.ratio()*100>>> print d44.4444444444

http://docs.python.org/library/difflib.html

python difflib

From the docs:

The SequenceMatcher class has this constructor:
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

The problem in your code is that by doing

seq=difflib.SequenceMatcher(a,b)

you are passing a as value for isjunk and b as value for a, leaving the default '' value for b. This results in a ratio of 0.0.

One way to overcome this (already mentioned by Lennart) is to explicitly pass None as extra first parameter so all the keyword arguments get assigned the correct values.

However I just found, and wanted to mention another solution, that doesn't touch the isjunk argument but uses the set_seqs() method to specify the different sequences.

>>> import difflib>>> a = 'abcd'>>> b = 'ab123'>>> seq = difflib.SequenceMatcher()>>> seq.set_seqs(a.lower(), b.lower())>>> d = seq.ratio()*100>>> print d44.44444444444444

CodeHunter

How to use SequenceMatcher to find similarity between two strings?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last