Failed loading english.pickle with nltk.data.load

I had this same problem. Go into a python shell and type:

>>> import nltk>>> nltk.download()

Then an installation window appears. Go to the 'Models' tab and select 'punkt' from under the 'Identifier' column. Then click Download and it will install the necessary files. Then it should work!

python jenkins nltk

The main reason why you see that error is nltk couldn't find punkt package. Due to the size of nltk suite, all available packages are not downloaded by default when one installs it.

You can download punkt package like this.

import nltknltk.download('punkt')from nltk import word_tokenize,sent_tokenize

This is also recommended in the error message in more recent versions:

LookupError: **********************************************************************  Resource punkt not found.  Please use the NLTK Downloader to obtain the resource:  >>> import nltk  >>> nltk.download('punkt')    Searched in:    - '/root/nltk_data'    - '/usr/share/nltk_data'    - '/usr/local/share/nltk_data'    - '/usr/lib/nltk_data'    - '/usr/local/lib/nltk_data'    - '/usr/nltk_data'    - '/usr/lib/nltk_data'    - ''**********************************************************************

If you do not pass any argument to the download function, it downloads all packages i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers.

nltk.download()

The above function saves packages to a specific directory. You can find that directory location from comments here. https://github.com/nltk/nltk/blob/67ad86524d42a3a86b1f5983868fd2990b59f1ba/nltk/downloader.py#L1051

python jenkins nltk

This is what worked for me just now:

# Do this in a separate python interpreter session, since you only have to do it onceimport nltknltk.download('punkt')# Do this in your ipython notebook or analysis scriptfrom nltk.tokenize import word_tokenizesentences = [    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",    "Professor Plum has a green plant in his study.",    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]sentences_tokenized = []for s in sentences:    sentences_tokenized.append(word_tokenize(s))

sentences_tokenized is a list of a list of tokens:

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]

The sentences were taken from the example ipython notebook accompanying the book "Mining the Social Web, 2nd Edition"

CodeHunter

Failed loading english.pickle with nltk.data.load

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last