Load PreComputed Vectors Gensim
You can download pre-trained word vectors from here (get the file 'GoogleNews-vectors-negative300.bin'):word2vec
Extract the file and then you can load it in python like:
model = gensim.models.word2vec.Word2Vec.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)model.most_similar('dog')
EDIT (May 2017):As the above code is now deprecated, this is how you'd load the vectors now:
model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)
As far as I know, Gensim can load two binary formats, word2vec and fastText, and a generic plain text format which can be created by most word embedding tools. The generic plain text format looks like this (in this example 20000 is the size of the vocabulary and 100 is the length of vector)
20000 100the 0.476841 -0.620207 -0.002157 0.359706 -0.591816 [98 more numbers...]and 0.223408 0.231993 -0.231131 -0.900311 -0.225111 [98 more numbers..][19998 more lines...]
Chaitanya Shivade has explained in his answer here, how to use a script provided by Gensim to convert the Glove format (each line: word + vector) into the generic format.
Loading the different formats is easy, but it is also easy to get them mixed up:
import gensimmodel_file = path/to/model/file
1) Loading binary word2vec
model = gensim.models.word2vec.Word2Vec.load_word2vec_format(model_file)
2) Loading binary fastText
model = gensim.models.fasttext.FastText.load_fasttext_format(model_file)
3) Loading the generic plain text format (which has been introduced by word2vec)
model = gensim.models.keyedvectors.Word2VecKeyedVectors.load_word2vec_format(model_file)
If you only plan to use the word embeddings and not to continue to train them in Gensim, you may want to use the KeyedVector class. This will reduce the amount of memory you need to load the vectors considerably (detailed explanation).
The following will load the binary word2vec format as keyedvectors:
model = gensim.models.keyedvectors.Word2VecKeyedVectors.load_word2vec_format(model_file, binary=True)