What meaning does the length of a Word2vec vector have? What meaning does the length of a Word2vec vector have? python python

What meaning does the length of a Word2vec vector have?


I think the answer you are looking for is described in the 2015 paper Measuring Word SignificanceusingDistributed Representations of Words by Adriaan Schakel and Benjamin Wilson. The key points:

When a word appears in different contexts, its vector gets moved in different directions during updates. The final vector then represents some sort of weighted average over the various contexts. Averaging over vectors that point in different directions typically results in a vector that gets shorter with increasing number of different contexts in which the word appears. For words to be used in many different contexts, they must carry little meaning. Prime examples of such insignificant words are high-frequency stop words, which are indeed represented by short vectors despite their high term frequencies ...


For given term frequency, the vector length is seen to take values only in a narrow interval. That interval initially shifts upwards with increasing frequency. Around a frequency of about 30, that trend reverses and the interval shifts downwards.

...

Both forces determining the length of a word vector are seen at work here. Small-frequency words tend to be used consistently, so that the more frequently such words appear, the longer their vectors. This tendency is reflected by the upwards trend in Fig. 3 at low frequencies. High-frequency words, on the other hand, tend to be used in many different contexts, the more so, the more frequently they occur. The averaging over an increasing number of different contexts shortens the vectors representing such words. This tendency is clearly reflected by the downwards trend in Fig. 3 at high frequencies, culminating in punctuation marks and stop words with short vectors at the very end.

...

Graph showing the trend described in the previous excerpt

Figure 3: Word vector length v versus term frequency tf of all words in the hep-th vocabulary. Note the logarithmic scale used on the frequency axis. The dark symbols denote bin means with the kth bin containing the frequencies in the interval [2k−1, 2k − 1] with k = 1, 2, 3, . . .. These means are included as a guide to the eye. The horizontal line indicates the length v = 1.37 of the mean vector


4 Discussion

Most applications of distributed representations of words obtained through word2vec so far centered around semantics. A host of experiments have demonstrated the extent to which the direction of word vectors captures semantics. In this brief report, it was pointed out that not only the direction, but also the length of word vectors carries important information. Specifically, it was shown that word vector length furnishes, in combination with term frequency, a useful measure of word significance.