How to efficiently get the mean of the elements in two list of lists in Python
You can do it in O(n) (single pass over each list) by converting 1 to a dict, then per item in the 2nd list access that dict (in O(1)), like this:
mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]l1_as_dict = dict(mylist1)myoutput = []for item,price2 in mylist2: if item in l1_as_dict: price1 = l1_as_dict[item] myoutput.append([item, (price1+price2)/2])print(myoutput)
Output:
[['chocolate', 0.5], ['egg', 0.45]]
An O(n)
solution that will average all items.
Construct a dictionary with a list of the values and then average that dictionary afterwards:
In []:d = {}for lst in (mylist1, mylist2): for i, v in lst: d.setdefault(i, []).append(v) # alternative use collections.defaultdict[(k, sum(v)/len(v)) for k, v in d.items()]Out[]:[('lemon', 0.1), ('egg', 0.45), ('muffin', 0.3), ('chocolate', 0.5), ('milk', 0.2), ('carrot', 0.8)]
Then if you just want the common ones you can add a guard:
In []:[(k, sum(v)/len(v)) for k, v in d.items() if len(v) > 1]Out[]:[('egg', 0.45), ('chocolate', 0.5)]
This extends to any number of lists and makes no assumption around the number of common elements.
Here is one solution that uses collections.defaultdict
to group the items and calculates the averages with statistics.mean
:
from collections import defaultdictfrom statistics import meanmylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]d = defaultdict(list)for lst in (mylist1, mylist2): for k, v in lst: d[k].append(v)result = [[k, mean(v)] for k, v in d.items()]print(result)# [['lemon', 0.1], ['egg', 0.45], ['muffin', 0.3], ['chocolate', 0.5], ['milk', 0.2], ['carrot', 0.8]]
If we only want common keys, just check if the values are more than 1:
result = [[k, mean(v)] for k, v in d.items() if len(v) > 1]print(result)# [['egg', 0.45], ['chocolate', 0.5]]
We could also just build the result from set intersection:
mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]d1, d2 = dict(mylist1), dict(mylist2)result = [[k, (d1[k] + d2[k]) / 2] for k in d1.keys() & d2.keys()]print(result)# [['egg', 0.45], ['chocolate', 0.5]]