counting n-gram frequency in python nltk

NLTK comes with its own bigrams generator, as well as a convenient FreqDist() function.

f = open('a_text_file')raw = f.read()tokens = nltk.word_tokenize(raw)#Create your bigramsbgs = nltk.bigrams(tokens)#compute frequency distribution for all the bigrams in the textfdist = nltk.FreqDist(bgs)for k,v in fdist.items():    print k,v

Once you have access to the BiGrams and the frequency distributions, you can filter according to your needs.

Hope that helps.

python nltk n-gram

The finder.ngram_fd.viewitems() function works

python nltk n-gram

from nltk import FreqDistfrom nltk.util import ngrams    def compute_freq():   textfile = open('corpus.txt','r')   bigramfdist = FreqDist()   threeramfdist = FreqDist()   for line in textfile:        if len(line) > 1:        tokens = line.strip().split(' ')        bigrams = ngrams(tokens, 2)        bigramfdist.update(bigrams)compute_freq()

CodeHunter

counting n-gram frequency in python nltk

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last