Best way to determine if a sequence is in another sequence?
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s3>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s(nothing)
Here's a brute-force approach O(n*m)
(similar to @mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python O(n+m)
(see @Gregg Lind answer) for small input sequences.
#!/usr/bin/env pythondef index(subseq, seq): """Return an index of `subseq`uence in the `seq`uence. Or `-1` if `subseq` is not a subsequence of the `seq`. The time complexity of the algorithm is O(n*m), where n, m = len(seq), len(subseq) >>> index([1,2], range(5)) 1 >>> index(range(1, 6), range(5)) -1 >>> index(range(5), range(5)) 0 >>> index([1,2], [0, 1, 0, 1, 2]) 3 """ i, n, m = -1, len(seq), len(subseq) try: while True: i = seq.index(subseq[0], i + 1, n - m + 1) if subseq == seq[i:i + m]: return i except ValueError: return -1if __name__ == '__main__': import doctest; doctest.testmod()
I wonder how large is the small in this case?
A simple approach: Convert to strings and rely on string matching.
Example using lists of strings:
>>> f = ["foo", "bar", "baz"] >>> g = ["foo", "bar"] >>> ff = str(f).strip("[]") >>> gg = str(g).strip("[]") >>> gg in ff True
Example using tuples of strings:
>>> x = ("foo", "bar", "baz")>>> y = ("bar", "baz")>>> xx = str(x).strip("()")>>> yy = str(y).strip("()")>>> yy in xxTrue
Example using lists of numbers:
>>> f = [1 , 2, 3, 4, 5, 6, 7]>>> g = [4, 5, 6]>>> ff = str(f).strip("[]")>>> gg = str(g).strip("[]")>>> gg in ffTrue