Doc2Vec Get most similar documents

You need to use infer_vector to get a document vector of the new text - which does not alter the underlying model.

Here is how you do it:

tokens = "a new sentence to match".split()new_vector = model.infer_vector(tokens)sims = model.docvecs.most_similar([new_vector]) #gives you top 10 document tags and their cosine similarity

Edit:

Here is an example of how the underlying model does not change after infer_vec is called.

import numpy as npwords = "king queen man".split()len_before =  len(model.docvecs) #number of docs#word vectors for king, queen, manw_vec0 = model[words[0]]w_vec1 = model[words[1]]w_vec2 = model[words[2]]new_vec = model.infer_vector(words)len_after =  len(model.docvecs)print np.array_equal(model[words[0]], w_vec0) # Trueprint np.array_equal(model[words[1]], w_vec1) # Trueprint np.array_equal(model[words[2]], w_vec2) # Trueprint len_before == len_after #True

CodeHunter

Doc2Vec Get most similar documents

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last