How can I optimize this Python code to generate all words with word-distance 1?

python optimization python-2.x levenshtein-distance word-diff

If your wordlist is very long, might it be more efficient to generate all possible 1-letter-differences from 'word', then check which ones are in the list? I don't know any Python but there should be a suitable data structure for the wordlist allowing for log-time lookups.

I suggest this because if your words are reasonable lengths (~10 letters), then you'll only be looking for 250 potential words, which is probably faster if your wordlist is larger than a few hundred words.

python optimization python-2.x levenshtein-distance word-diff

Your function distance is calculating the total distance, when you really only care about distance=1. The majority of cases you'll know it's >1 within a few characters, so you could return early and save a lot of time.

Beyond that, there might be a better algorithm, but I can't think of it.

Edit: Another idea.

You can make 2 cases, depending on whether the first character matches. If it doesn't match, the rest of the word has to match exactly, and you can test for that in one shot. Otherwise, do it similarly to what you were doing. You could even do it recursively, but I don't think that would be faster.

def DifferentByOne(word1, word2):    if word1[0] != word2[0]:        return word1[1:] == word2[1:]    same = True    for i in range(1, len(word1)):        if word1[i] != word2[i]:            if same:                same = False            else:                return False    return not same

Edit 2: I've deleted the check to see if the strings are the same length, since you say it's redundant. Running Ryan's tests on my own code and on the is_neighbors function provided by MizardX, I get the following:

Original distance(): 3.7 seconds
My DifferentByOne(): 1.1 seconds
MizardX's is_neighbors(): 3.7 seconds

Edit 3: (Probably getting into community wiki territory here, but...)

Trying your final definition of is_neighbors() with izip instead of zip: 2.9 seconds.

Here's my latest version, which still times at 1.1 seconds:

def DifferentByOne(word1, word2):    if word1[0] != word2[0]:        return word1[1:] == word2[1:]    different = False    for i in range(1, len(word1)):        if word1[i] != word2[i]:            if different:                return False            different = True    return different

python optimization python-2.x levenshtein-distance word-diff

from itertools import izipdef is_neighbors(word1,word2):    different = False    for c1,c2 in izip(word1,word2):        if c1 != c2:            if different:                return False            different = True    return different

Or maybe in-lining the izip code:

def is_neighbors(word1,word2):    different = False    next1 = iter(word1).next    next2 = iter(word2).next    try:        while 1:            if next1() != next2():                if different:                    return False                different = True    except StopIteration:        pass    return different

And a rewritten getchildren:

def iterchildren(word, wordlist):    return ( w for w in wordlist if is_neighbors(word, w) )

izip(a,b) returns an iterator over pairs of values from a and b.
zip(a,b) returns a list of pairs from a and b.

CodeHunter

How can I optimize this Python code to generate all words with word-distance 1?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last