Pythonic way to merge two overlapping lists, preserving order

You can try the following:

>>> a = [1, 3, 9, 8, 3, 4, 5]>>> b = [3, 4, 5, 7, 8]>>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:])>>> i = next(matches, 0)>>> a + b[i:][1, 3, 9, 8, 3, 4, 5, 7, 8]

The idea is we check the first i elements of b (b[:i]) with the last i elements of a (a[-i:]). We take i in decreasing order, starting from the length of b until 1 (xrange(len(b), 0, -1)) because we want to match as much as possible. We take the first such i by using next and if we don't find it we use the zero value (next(..., 0)). From the moment we found the i, we add to a the elements of b from index i.

python list python-3.x merge

There are a couple of easy optimizations that are possible.

You don't need to start at master[1], since the longest overlap starts at master[-len(addition)]
If you add a call to list.index you can avoid creating sub-lists and comparing lists for each index:

This approach keeps the code pretty understandable too (and easier to optimize by using cython or pypy):

master = [1,3,9,8,3,4,5]addition = [3,4,5,7,8]def merge(master, addition):    first = addition[0]    n = max(len(master) - len(addition), 1)  # (1)    while 1:        try:            n = master.index(first, n)       # (2)        except ValueError:            return master + addition        if master[-n:] == addition[:n]:            return master + addition[n:]        n += 1

python list python-3.x merge

One trivial optimization is not iterating over the whole master list. I.e., replace while n < len(master) with for n in range(min(len(addition), len(master))) (and don't increment n in the loop). If there is no match, your current code will iterate over the entire master list, even if the slices being compared aren't even of the same length.

Another concern is that you're taking slices of master and addition in order to compare them, which creates two new lists every time, and isn't really necessary. This solution (inspired by Boyer-Moore) doesn't use slicing:

def merge(master, addition):    overlap_lens = (i + 1 for i, e in enumerate(addition) if e == master[-1])    for overlap_len in overlap_lens:        for i in range(overlap_len):            if master[-overlap_len + i] != addition[i]:                break        else:            return master + addition[overlap_len:]    return master + addition

The idea here is to generate all the indices of the last element of master in addition, and add 1 to each. Since a valid overlap must end with the last element of master, only those values are lengths of possible overlaps. Then we can check for each of them if the elements before it also line up.

The function currently assumes that master is longer than addition (you'll probably get an IndexError at master[-overlap_len + i] if it isn't). Add a condition to the overlap_lens generator if you can't guarantee it.

It's also non-greedy, i.e. it looks for the smallest non-empty overlap (merge([1, 2, 2], [2, 2, 3]) will return [1, 2, 2, 2, 3]). I think that's what you meant by "to merge at the last possible valid position". If you want a greedy version, reverse the overlap_lens generator.

CodeHunter

Pythonic way to merge two overlapping lists, preserving order

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last