How can I calculate the Jaccard Similarity of two lists containing strings in Python? How can I calculate the Jaccard Similarity of two lists containing strings in Python? python python

How can I calculate the Jaccard Similarity of two lists containing strings in Python?


I ended up writing my own solution after all:

def jaccard_similarity(list1, list2):    intersection = len(list(set(list1).intersection(list2)))    union = (len(set(list1)) + len(set(list2))) - intersection    return float(intersection) / union


For Python 3:

def jaccard_similarity(list1, list2):    s1 = set(list1)    s2 = set(list2)    return float(len(s1.intersection(s2)) / len(s1.union(s2)))list1 = ['dog', 'cat', 'cat', 'rat']list2 = ['dog', 'cat', 'mouse']jaccard_similarity(list1, list2)>>> 0.5

For Python2 use return len(s1.intersection(s2)) / float(len(s1.union(s2)))


@aventinus I don't have enough reputation to add a comment to your answer, but just to make things clearer, your solution measures the jaccard_similarity but the function is misnamed as jaccard_distance, which is actually 1 - jaccard_similarity