Python: gensim: RuntimeError: you must first build vocabulary before training the model Python: gensim: RuntimeError: you must first build vocabulary before training the model python python

Python: gensim: RuntimeError: you must first build vocabulary before training the model


Default min_count in gensim's Word2Vec is set to 5. If there is no word in your vocab with frequency greater than 4, your vocab will be empty and hence the error. Try

voc_vec = word2vec.Word2Vec(vocab, min_count=1)


Input to the gensim's Word2Vec can be a list of sentences or list of words or list of list of sentences.

E.g.

1. sentences = ['I love ice-cream', 'he loves ice-cream', 'you love ice cream']2. words = ['i','love','ice - cream', 'like', 'ice-cream']3. sentences = [['i love ice-cream'], ['he loves ice-cream'], ['you love ice cream']]

build the vocab before training

model.build_vocab(sentences, update=False)

just check out the link for detailed info