How can I optimize this Python code to generate all words with word-distance 1?
If your wordlist is very long, might it be more efficient to generate all possible 1-letter-differences from 'word', then check which ones are in the list? I don't know any Python but there should be a suitable data structure for the wordlist allowing for log-time lookups.
I suggest this because if your words are reasonable lengths (~10 letters), then you'll only be looking for 250 potential words, which is probably faster if your wordlist is larger than a few hundred words.
Your function distance
is calculating the total distance, when you really only care about distance=1. The majority of cases you'll know it's >1 within a few characters, so you could return early and save a lot of time.
Beyond that, there might be a better algorithm, but I can't think of it.
Edit: Another idea.
You can make 2 cases, depending on whether the first character matches. If it doesn't match, the rest of the word has to match exactly, and you can test for that in one shot. Otherwise, do it similarly to what you were doing. You could even do it recursively, but I don't think that would be faster.
def DifferentByOne(word1, word2): if word1[0] != word2[0]: return word1[1:] == word2[1:] same = True for i in range(1, len(word1)): if word1[i] != word2[i]: if same: same = False else: return False return not same
Edit 2: I've deleted the check to see if the strings are the same length, since you say it's redundant. Running Ryan's tests on my own code and on the is_neighbors function provided by MizardX, I get the following:
- Original distance(): 3.7 seconds
- My DifferentByOne(): 1.1 seconds
- MizardX's is_neighbors(): 3.7 seconds
Edit 3: (Probably getting into community wiki territory here, but...)
Trying your final definition of is_neighbors() with izip instead of zip: 2.9 seconds.
Here's my latest version, which still times at 1.1 seconds:
def DifferentByOne(word1, word2): if word1[0] != word2[0]: return word1[1:] == word2[1:] different = False for i in range(1, len(word1)): if word1[i] != word2[i]: if different: return False different = True return different
from itertools import izipdef is_neighbors(word1,word2): different = False for c1,c2 in izip(word1,word2): if c1 != c2: if different: return False different = True return different
Or maybe in-lining the izip
code:
def is_neighbors(word1,word2): different = False next1 = iter(word1).next next2 = iter(word2).next try: while 1: if next1() != next2(): if different: return False different = True except StopIteration: pass return different
And a rewritten getchildren
:
def iterchildren(word, wordlist): return ( w for w in wordlist if is_neighbors(word, w) )