PyTorch / Gensim - How to load pre-trained word embeddings

python neural-network pytorch gensim embedding

I just wanted to report my findings about loading a gensim embedding with PyTorch.

Solution for PyTorch 0.4.0 and newer:

From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable.Here is an example from the documentation.

import torchimport torch.nn as nn# FloatTensor containing pretrained weightsweight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])embedding = nn.Embedding.from_pretrained(weight)# Get embeddings for index 1input = torch.LongTensor([1])embedding(input)

The weights from gensim can easily be obtained by:

import gensimmodel = gensim.models.KeyedVectors.load_word2vec_format('path/to/file')weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated

As noted by @Guglie: in newer gensim versions the weights can be obtained by model.wv:

weights = model.wv

Solution for PyTorch version 0.3.1 and older:

I'm using version 0.3.1 and from_pretrained() isn't available in this version.

Therefore I created my own from_pretrained so I can also use it with 0.3.1.

Code for from_pretrained for PyTorch versions 0.3.1 or lower:

def from_pretrained(embeddings, freeze=True):    assert embeddings.dim() == 2, \         'Embeddings parameter is expected to be 2-dimensional'    rows, cols = embeddings.shape    embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols)    embedding.weight = torch.nn.Parameter(embeddings)    embedding.weight.requires_grad = not freeze    return embedding

The embedding can be loaded then just like this:

embedding = from_pretrained(weights)

I hope this is helpful for someone.

python neural-network pytorch gensim embedding

I think it is easy. Just copy the embedding weight from gensim to the corresponding weight in PyTorch embedding layer.

You need to make sure two things are correct: first is that the weight shape has to be correct, second is that the weight has to be converted to PyTorch FloatTensor type.

python neural-network pytorch gensim embedding

from gensim.models import Word2Vecmodel = Word2Vec(reviews,size=100, window=5, min_count=5, workers=4)#gensim model createdimport torchweights = torch.FloatTensor(model.wv.vectors)embedding = nn.Embedding.from_pretrained(weights)

CodeHunter

PyTorch / Gensim - How to load pre-trained word embeddings

Solution for PyTorch `0.4.0` and newer:

Solution for PyTorch version `0.3.1` and older:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

PyTorch / Gensim - How to load pre-trained word embeddings

Solution for PyTorch 0.4.0 and newer:

Solution for PyTorch version 0.3.1 and older:

Recent Posts

Solution for PyTorch `0.4.0` and newer:

Solution for PyTorch version `0.3.1` and older: