Interpreting negative Word2Vec similarity from gensim Interpreting negative Word2Vec similarity from gensim python python

Interpreting negative Word2Vec similarity from gensim


Cosine similarity ranges from -1 to 1, same as a regular cosine wave.

Cosine Wave

As for the source:

https://github.com/RaRe-Technologies/gensim/blob/ba1ce894a5192fc493a865c535202695bb3c0424/gensim/models/word2vec.py#L1511

def similarity(self, w1, w2):    """    Compute cosine similarity between two words.    Example::      >>> trained_model.similarity('woman', 'man')      0.73723527      >>> trained_model.similarity('woman', 'woman')      1.0    """    return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2])


As others have said, the cosine similarity can range from -1 to 1 based on the angle between the two vectors being compared. The exact implementation in gensim is a simple dot product of the normalized vectors.

https://github.com/RaRe-Technologies/gensim/blob/4f0e2ae0531d67cee8d3e06636e82298cb554b04/gensim/models/keyedvectors.py#L581

def similarity(self, w1, w2):        """        Compute cosine similarity between two words.        Example::          >>> trained_model.similarity('woman', 'man')          0.73723527          >>> trained_model.similarity('woman', 'woman')          1.0        """        return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2]))

In terms of interpretation, you can think of these values like you might think of correlation coefficients. A value of 1 is a perfect relationship between word vectors (e.g., "woman" compared with "woman"), a value of 0 represents no relationship between words, and a value of -1 represents a perfect opposite relationship between words.