How do I find the frequency count of a word in English using WordNet?

python nltk wordnet

In WordNet, every Lemma has a frequency count that is returned by the methodlemma.count(), and which is stored in the file nltk_data/corpora/wordnet/cntlist.rev.

Code example:

from nltk.corpus import wordnetsyns = wordnet.synsets('stack')for s in syns:    for l in s.lemmas():        print l.name + " " + str(l.count())

Result:

stack 2batch 0deal 1flock 1good_deal 13great_deal 10hatful 0heap 2lot 13mass 14mess 0...

However, many counts are zero and there is no information in the source file or in the documentation which corpus was used to create this data. According to the book Speech and Language Processing from Daniel Jurafsky and James H. Martin, the sense frequencies come from the SemCor corpus which is a subset of the already small and outdated Brown Corpus.

So it's probably best to choose the corpus that fits best to the your application and create the data yourself as Christopher suggested.

To make this Python3.x compatible just do:

Code example:

from nltk.corpus import wordnetsyns = wordnet.synsets('stack')for s in syns:    for l in s.lemmas():        print( l.name() + " " + str(l.count()))

python nltk wordnet

You can sort of do it using the brown corpus, though it's out of date (last revised in 1979), so it's missing lots of current words.

import nltkfrom nltk.corpus import brownfrom nltk.probability import *words = FreqDist()for sentence in brown.sents():    for word in sentence:        words.inc(word.lower())print words["and"]print words.freq("and")

You could then cpickle the FreqDist off to a file for faster loading later.

A corpus is basically just a file full of sentences, one per line, and there are lots of other corpora out there, so you could probably find one that fits your purpose. A couple of other sources of more current corpora: Google, American National Corpus.

You can also suppsedly get a current list of the top 60,000 words and their frequencies from the Corpus of Contemporary American English

python nltk wordnet

Check out this site for word frequencies:http://corpus.byu.edu/coca/

Somebody compiled a list of words taken from opensubtitles.org (movie scripts). There's a free simple text file formatted like this available for download. In many different languages.

you 6281002i 5685306the 4768490to 3453407a 3048287it 2879962

http://invokeit.wordpress.com/frequency-word-lists/

CodeHunter

How do I find the frequency count of a word in English using WordNet?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last