Gensim: KeyError: "word not in vocabulary" Gensim: KeyError: "word not in vocabulary" python python

Gensim: KeyError: "word not in vocabulary"


The first parameter passed to gensim.models.Word2Vec is an iterable of sentences. Sentences themselves are a list of words. From the docs:

Initialize the model from an iterable of sentences. Each sentence is a list of words (unicode strings) that will be used for training.

Right now, it thinks that each word in your list b is a sentence and so it is doing Word2Vec for each character in each word, as opposed to each word in your b. Right now you can do:

model = gensim.models.Word2Vec(b,min_count=1,size=32)print(model['a'])array([  7.42487283e-03,  -5.65282721e-03,   1.28707094e-02, ... ]

To get it to work for words, simply wrap b in another list so that it is interpreted correctly:

model = gensim.models.Word2Vec([b],min_count=1,size=32)print(model['buy'])array([-0.01331611,  0.00496594, -0.00165093, -0.01444992,  0.01393849, ... ]


From the docs you need to pass iterable sentences so whatever you pass to the function it treats input as a iterable so here you are passing only words so it counts word2vec vector for each in charecter in the whole corpus.

So In order to avoid that problem, pass the list of words inside a list.

word2vec_model = gensim.models.Word2Vec([b],min_count=1,size=32)