Generate bigrams with NLTK

python nltk n-gram

nltk.bigrams() returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list(). It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):

bigrm = list(nltk.bigrams(text.split()))

To print them out separated with commas, you could (in python 3):

print(*map(' '.join, bigrm), sep=', ')

If on python 2, then for example:

print ', '.join(' '.join((a, b)) for a, b in bigrm)

Note that just for printing you do not need to generate a list, just use the iterator.

python nltk n-gram

The following code produce a bigram list for a given sentence

>>> import nltk>>> from nltk.tokenize import word_tokenize>>> text = "to be or not to be">>> tokens = nltk.word_tokenize(text)>>> bigrm = nltk.bigrams(tokens)>>> print(*map(' '.join, bigrm), sep=', ')to be, be or, or not, not to, to be

CodeHunter

Generate bigrams with NLTK

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last