How to find the closest word to a vector using word2vec

python text-mining data-analysis word2vec

For gensim implementation of word2vec there is most_similar() function that lets you find words semantically close to a given word:

>>> model.most_similar(positive=['woman', 'king'], negative=['man'])[('queen', 0.50882536), ...]

or to it's vector representation:

>>> your_word_vector = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)>>> model.most_similar(positive=[your_word_vector], topn=1))

where topn defines the desired number of returned results.

However, my gut feeling is that function does exactly the same that you proposed, i.e. calculates cosine similarity for the given vector and each other vector in the dictionary (which is quite inefficient...)

python text-mining data-analysis word2vec

Don't forget to add empty array with negative words in most_similar function:

import numpy as npmodel_word_vector = np.array( my_vector, dtype='f')topn = 20;most_similar_words = model.most_similar( [ model_word_vector ], [], topn)

python text-mining data-analysis word2vec

Alternatively, model.wv.similar_by_vector(vector, topn=10, restrict_vocab=None) is also available in the gensim package.

Find the top-N most similar words by vector.
Parameters:
vector (numpy.array) – Vector from which similarities are to be computed.
topn ({int, False}, optional) – Number of top-N similar words to return. If topn is False, similar_by_vector returns the vector of similarity scores.
restrict_vocab (int, optional) – Optional integer which limits the range of vectors which are searched for most-similar values. For example, restrict_vocab=10000 would only check the first 10000 word vectors in the vocabulary order. (This may be meaningful if you’ve sorted the vocabulary by descending frequency.)
Returns: Sequence of (word, similarity).
Return type: list of (str, float)

CodeHunter

How to find the closest word to a vector using word2vec

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last