elegant find sub-list in list elegant find sub-list in list python python

elegant find sub-list in list


I know this question is 5 months old and already "accepted", but googling a very similar problem brought me to this question and all the answers seem to have a couple of rather significant problems, plus I'm bored and want to try my hand at a SO answer, so I'm just going to rattle off what I've found.

The first part of the question, as I understand it, is pretty trivial: just return the original list with all the elements not in the "pattern" filtered out. Following that thinking, the first code I thought of used the filter() function:

def subfinder(mylist, pattern):    return list(filter(lambda x: x in pattern, mylist))

I would say that this solution is definitely more succinct than the original solution, but it's not any faster, or at least not appreciably, and I try to avoid lambda expressions if there's not a very good reason for using them. In fact, the best solution I could come up with involved a simple list comprehension:

def subfinder(mylist, pattern):    pattern = set(pattern)    return [x for x in mylist if x in pattern]

This solution is both more elegant and significantly faster than the original: the comprehension is about 120% faster than the original, while casting the pattern into a set first bumps that up to a whopping 320% faster in my tests.

Now for the bonus: I'll just jump right into it, my solution is as follows:

def subfinder(mylist, pattern):    matches = []    for i in range(len(mylist)):        if mylist[i] == pattern[0] and mylist[i:i+len(pattern)] == pattern:            matches.append(pattern)    return matches

This is a variation of Steven Rumbalski's "inefficient one liner", that, with the addition of the "mylist[i] == pattern[0]" check and thanks to python's short-circuit evaluation, is significantly faster than both the original statement and the itertools version (and every other offered solution as far as I can tell) and it even supports overlapping patterns. So there you go.


This will get the "bonus" part of your question:

pattern = [1, 2, 3, 4]search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]cursor = 0found = []for i in search_list:    if i == pattern[cursor]:        cursor += 1        if cursor == len(pattern):            found.append(pattern)            cursor = 0    else:        cursor = 0

For non-bonus:

pattern = [1, 2, 3, 4]search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]cursor = 0found = []for i in search_list:    if i != pattern[cursor]:        if cursor > 0:            found.append(pattern[:cursor])        cursor = 0    else:        cursor += 1

Finally, this one handles overlaps:

def find_matches(pattern_list, search_list):    cursor_list = []    found = []    for element in search_list:        cursors_to_kill = []        for cursor_index in range(len(cursor_list)):            if element == pattern_list[cursor_list[cursor_index]]:                cursor_list[cursor_index] += 1                if cursor_list[cursor_index] == len(pattern_list):                    found.append(pattern_list)                    cursors_to_kill.append(cursor_index)            else:                cursors_to_kill.append(cursor_index)        cursors_to_kill.reverse()        for cursor_index in cursors_to_kill:            cursor_list.pop(cursor_index)        if element == pattern_list[0]:            cursor_list.append(1)    return found


An iterator-based approach that is still based on the naive algorithm, but tries to do as much implicit looping as possible making use of .index():

def find_pivot(seq, subseq):    n = len(seq)    m = len(subseq)    stop = n - m + 1    if n > 0:        item = subseq[0]        i = 0        try:            while i < stop:                i = seq.index(item, i)                if seq[i:i + m] == subseq:                    yield i                i += 1        except ValueError:            return

compared to a couple of others approaches with various degrees of explicit looping:

def find_loop(seq, subseq):    n = len(seq)    m = len(subseq)    for i in range(n - m + 1):        if all(seq[i + j] == subseq[j] for j in (range(m))):            yield i
def find_slice(seq, subseq):    n = len(seq)    m = len(subseq)    for i in range(n - m + 1):        if seq[i:i + m] == subseq:            yield i
def find_mix(seq, subseq):    n = len(seq)    m = len(subseq)    for i in range(n - m + 1):        if seq[i] == subseq[0] and seq[i:i + m] == subseq:            yield i

one would get:

a = list(range(10))print(a)# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]b = list(range(5, 10))print(b)# [5, 6, 7, 8, 9]funcs = find_pivot, find_loop, find_slice, find_mix, for func in funcs:    print()    print(func.__name__)    print(list(func(a * 10, b)))    aa = a * 100    %timeit list(func(aa, b))    random.shuffle(aa)    %timeit list(func(aa, b))# find_pivot# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]# 10000 loops, best of 3: 49.6 µs per loop# 10000 loops, best of 3: 50.1 µs per loop# find_loop# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]# 1000 loops, best of 3: 712 µs per loop# 1000 loops, best of 3: 680 µs per loop# find_slice# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]# 10000 loops, best of 3: 162 µs per loop# 10000 loops, best of 3: 162 µs per loop# find_mix# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]# 10000 loops, best of 3: 82.2 µs per loop# 10000 loops, best of 3: 83.9 µs per loop

Note that this is ~30% faster than the currently accepted answer with the tested input.