How to get the WordNet synset given an offset ID? How to get the WordNet synset given an offset ID? python python

How to get the WordNet synset given an offset ID?


As of NLTK 3.2.3, there's a public method for doing this:

wordnet.synset_from_pos_and_offset(pos, offset)

In earlier versions you can use:

wordnet._synset_from_pos_and_offset(pos, offset)

This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.

Example:

from nltk.corpus import wordnet as wnwn.synset_from_pos_and_offset('n',4543158)>> Synset('wagon.n.01')


For NTLK 3.2.3 or newer, please see donners45's answer.

For older versions of NLTK:

There is no built-in method in the NLTK but you could use this:

from nltk.corpus import wordnetsyns = list(wordnet.all_synsets())offsets_list = [(s.offset(), s) for s in syns]offsets_dict = dict(offsets_list)offsets_dict[14204095]>>> Synset('heatstroke.n.01')

You can then pickle the dictionary and load it whenever you need it.

For NLTK versions prior to 3.0, replace the line

offsets_list = [(s.offset(), s) for s in syns]

with

offsets_list = [(s.offset, s) for s in syns]

since prior to NLTK 3.0 offset was an attribute instead of a method.


You can use of2ss(), For example:

from nltk.corpus import wordnet as wnsyn = wn.of2ss('01580050a')

will return Synset('necessary.a.01')