Using a pre-trained word embedding (word2vec or Glove) in TensorFlow

python numpy tensorflow deep-learning

There are a few ways that you can use a pre-trained embedding in TensorFlow. Let's say that you have the embedding in a NumPy array called embedding, with vocab_size rows and embedding_dim columns and you want to create a tensor W that can be used in a call to tf.nn.embedding_lookup().

Simply create W as a tf.constant() that takes embedding as its value:
```
W = tf.constant(embedding, name="W")
```
This is the easiest approach, but it is not memory efficient because the value of a tf.constant() is stored multiple times in memory. Since embedding can be very large, you should only use this approach for toy examples.
Create W as a tf.Variable and initialize it from the NumPy array via a tf.placeholder():
```
W = tf.Variable(tf.constant(0.0, shape=[vocab_size, embedding_dim]),                trainable=False, name="W")embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, embedding_dim])embedding_init = W.assign(embedding_placeholder)# ...sess = tf.Session()sess.run(embedding_init, feed_dict={embedding_placeholder: embedding})
```
This avoid storing a copy of embedding in the graph, but it does require enough memory to keep two copies of the matrix in memory at once (one for the NumPy array, and one for the tf.Variable). Note that I've assumed that you want to hold the embedding matrix constant during training, so W is created with trainable=False.
If the embedding was trained as part of another TensorFlow model, you can use a tf.train.Saver to load the value from the other model's checkpoint file. This means that the embedding matrix can bypass Python altogether. Create W as in option 2, then do the following:
```
W = tf.Variable(...)embedding_saver = tf.train.Saver({"name_of_variable_in_other_model": W})# ...sess = tf.Session()embedding_saver.restore(sess, "checkpoint_filename.ckpt")
```

python numpy tensorflow deep-learning

I use this method to load and share embedding.

W = tf.get_variable(name="W", shape=embedding.shape, initializer=tf.constant_initializer(embedding), trainable=False)

python numpy tensorflow deep-learning

2.0 Compatible Answer: There are many Pre-Trained Embeddings, which are developed by Google and which have been Open Sourced.

Some of them are Universal Sentence Encoder (USE), ELMO, BERT, etc.. and it is very easy to reuse them in your code.

Code to reuse the Pre-Trained Embedding, Universal Sentence Encoder is shown below:

  !pip install "tensorflow_hub>=0.6.0"  !pip install "tensorflow>=2.0.0"  import tensorflow as tf  import tensorflow_hub as hub  module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"  embed = hub.KerasLayer(module_url)  embeddings = embed(["A long sentence.", "single-word",                      "http://example.com"])  print(embeddings.shape)  #(3,128)

For more information the Pre-Trained Embeddings developed and open-sourced by Google, refer TF Hub Link.

CodeHunter

Using a pre-trained word embedding (word2vec or Glove) in TensorFlow

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last