How can I optimize this Python code to generate all words with word-distance 1? How can I optimize this Python code to generate all words with word-distance 1? python python

How can I optimize this Python code to generate all words with word-distance 1?


If your wordlist is very long, might it be more efficient to generate all possible 1-letter-differences from 'word', then check which ones are in the list? I don't know any Python but there should be a suitable data structure for the wordlist allowing for log-time lookups.

I suggest this because if your words are reasonable lengths (~10 letters), then you'll only be looking for 250 potential words, which is probably faster if your wordlist is larger than a few hundred words.


Your function distance is calculating the total distance, when you really only care about distance=1. The majority of cases you'll know it's >1 within a few characters, so you could return early and save a lot of time.

Beyond that, there might be a better algorithm, but I can't think of it.

Edit: Another idea.

You can make 2 cases, depending on whether the first character matches. If it doesn't match, the rest of the word has to match exactly, and you can test for that in one shot. Otherwise, do it similarly to what you were doing. You could even do it recursively, but I don't think that would be faster.

def DifferentByOne(word1, word2):    if word1[0] != word2[0]:        return word1[1:] == word2[1:]    same = True    for i in range(1, len(word1)):        if word1[i] != word2[i]:            if same:                same = False            else:                return False    return not same

Edit 2: I've deleted the check to see if the strings are the same length, since you say it's redundant. Running Ryan's tests on my own code and on the is_neighbors function provided by MizardX, I get the following:

  • Original distance(): 3.7 seconds
  • My DifferentByOne(): 1.1 seconds
  • MizardX's is_neighbors(): 3.7 seconds

Edit 3: (Probably getting into community wiki territory here, but...)

Trying your final definition of is_neighbors() with izip instead of zip: 2.9 seconds.

Here's my latest version, which still times at 1.1 seconds:

def DifferentByOne(word1, word2):    if word1[0] != word2[0]:        return word1[1:] == word2[1:]    different = False    for i in range(1, len(word1)):        if word1[i] != word2[i]:            if different:                return False            different = True    return different


from itertools import izipdef is_neighbors(word1,word2):    different = False    for c1,c2 in izip(word1,word2):        if c1 != c2:            if different:                return False            different = True    return different

Or maybe in-lining the izip code:

def is_neighbors(word1,word2):    different = False    next1 = iter(word1).next    next2 = iter(word2).next    try:        while 1:            if next1() != next2():                if different:                    return False                different = True    except StopIteration:        pass    return different

And a rewritten getchildren:

def iterchildren(word, wordlist):    return ( w for w in wordlist if is_neighbors(word, w) )
  • izip(a,b) returns an iterator over pairs of values from a and b.
  • zip(a,b) returns a list of pairs from a and b.