How to efficiently get the mean of the elements in two list of lists in Python How to efficiently get the mean of the elements in two list of lists in Python python python

How to efficiently get the mean of the elements in two list of lists in Python


You can do it in O(n) (single pass over each list) by converting 1 to a dict, then per item in the 2nd list access that dict (in O(1)), like this:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]l1_as_dict = dict(mylist1)myoutput = []for item,price2 in mylist2:    if item in l1_as_dict:        price1 = l1_as_dict[item]        myoutput.append([item, (price1+price2)/2])print(myoutput)

Output:

[['chocolate', 0.5], ['egg', 0.45]]


An O(n) solution that will average all items.
Construct a dictionary with a list of the values and then average that dictionary afterwards:

In []:d = {}for lst in (mylist1, mylist2):    for i, v in lst:        d.setdefault(i, []).append(v)   # alternative use collections.defaultdict[(k, sum(v)/len(v)) for k, v in d.items()]Out[]:[('lemon', 0.1), ('egg', 0.45), ('muffin', 0.3), ('chocolate', 0.5), ('milk', 0.2), ('carrot', 0.8)]

Then if you just want the common ones you can add a guard:

In []:[(k, sum(v)/len(v)) for k, v in d.items() if len(v) > 1]Out[]:[('egg', 0.45), ('chocolate', 0.5)]

This extends to any number of lists and makes no assumption around the number of common elements.


Here is one solution that uses collections.defaultdict to group the items and calculates the averages with statistics.mean:

from collections import defaultdictfrom statistics import meanmylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]d = defaultdict(list)for lst in (mylist1, mylist2):    for k, v in lst:        d[k].append(v)result = [[k, mean(v)] for k, v in d.items()]print(result)# [['lemon', 0.1], ['egg', 0.45], ['muffin', 0.3], ['chocolate', 0.5], ['milk', 0.2], ['carrot', 0.8]]

If we only want common keys, just check if the values are more than 1:

result = [[k, mean(v)] for k, v in d.items() if len(v) > 1]print(result)# [['egg', 0.45], ['chocolate', 0.5]]

We could also just build the result from set intersection:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]d1, d2 = dict(mylist1), dict(mylist2)result = [[k, (d1[k] + d2[k]) / 2] for k in d1.keys() & d2.keys()]print(result)# [['egg', 0.45], ['chocolate', 0.5]]