How to get the WordNet synset given an offset ID?
As of NLTK 3.2.3, there's a public method for doing this:
wordnet.synset_from_pos_and_offset(pos, offset)
In earlier versions you can use:
wordnet._synset_from_pos_and_offset(pos, offset)
This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.
Example:
from nltk.corpus import wordnet as wnwn.synset_from_pos_and_offset('n',4543158)>> Synset('wagon.n.01')
For NTLK 3.2.3 or newer, please see donners45's answer.
For older versions of NLTK:
There is no built-in method in the NLTK but you could use this:
from nltk.corpus import wordnetsyns = list(wordnet.all_synsets())offsets_list = [(s.offset(), s) for s in syns]offsets_dict = dict(offsets_list)offsets_dict[14204095]>>> Synset('heatstroke.n.01')
You can then pickle the dictionary and load it whenever you need it.
For NLTK versions prior to 3.0, replace the line
offsets_list = [(s.offset(), s) for s in syns]
with
offsets_list = [(s.offset, s) for s in syns]
since prior to NLTK 3.0 offset
was an attribute instead of a method.