Percentage Overlap of Two Lists Percentage Overlap of Two Lists python python

Percentage Overlap of Two Lists


From the principal point of view, I'd say that there are two sensible questions you might be asking:

  1. What percentage the overlap is if compared to the first list? I.e. how big is the common part in comparison to the first list?
  2. The same thing for the second list.
  3. What percentage the overlap is if compared to the "universe" (i.e. the union of both lists)?

There can surely be found other meanings as well and there would be many of them. All in all you should probably know what problem you're trying to solve.

From programming point of view, the solution is easy:

listA = ["Alice", "Bob", "Joe"]listB = ["Joe", "Bob", "Alice", "Ken"]setA = set(listA)setB = set(listB)overlap = setA & setBuniverse = setA | setBresult1 = float(len(overlap)) / len(setA) * 100result2 = float(len(overlap)) / len(setB) * 100result3 = float(len(overlap)) / len(universe) * 100


>>> len(set(listA)&set(listB)) / float(len(set(listA) | set(listB))) * 10075.0

I would calculate the common items out of the total distinct items.

len(set(listA)&set(listB)) returns the common items (3 in your example).

len(set(listA) | set(listB)) returns the total number of distinct items (4).

Multiply by 100 and you get percentage.


The maximum difference is when two lists have completely different elements. So we have at most n + m discrete elements, where n is size of first list and m is the size of second list. One measure can be:

2 * c / (n + m)

where c is the number of common elements. This can be calculated like this as percentage:

200.0 * len(set(listA) & set(listB)) / (len(listA) + len(listB))