Failed loading english.pickle with nltk.data.load Failed loading english.pickle with nltk.data.load jenkins jenkins

Failed loading english.pickle with nltk.data.load


I had this same problem. Go into a python shell and type:

>>> import nltk>>> nltk.download()

Then an installation window appears. Go to the 'Models' tab and select 'punkt' from under the 'Identifier' column. Then click Download and it will install the necessary files. Then it should work!


The main reason why you see that error is nltk couldn't find punkt package. Due to the size of nltk suite, all available packages are not downloaded by default when one installs it.

You can download punkt package like this.

import nltknltk.download('punkt')from nltk import word_tokenize,sent_tokenize

This is also recommended in the error message in more recent versions:

LookupError: **********************************************************************  Resource punkt not found.  Please use the NLTK Downloader to obtain the resource:  >>> import nltk  >>> nltk.download('punkt')    Searched in:    - '/root/nltk_data'    - '/usr/share/nltk_data'    - '/usr/local/share/nltk_data'    - '/usr/lib/nltk_data'    - '/usr/local/lib/nltk_data'    - '/usr/nltk_data'    - '/usr/lib/nltk_data'    - ''**********************************************************************

If you do not pass any argument to the download function, it downloads all packages i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers.

nltk.download()

The above function saves packages to a specific directory. You can find that directory location from comments here. https://github.com/nltk/nltk/blob/67ad86524d42a3a86b1f5983868fd2990b59f1ba/nltk/downloader.py#L1051


This is what worked for me just now:

# Do this in a separate python interpreter session, since you only have to do it onceimport nltknltk.download('punkt')# Do this in your ipython notebook or analysis scriptfrom nltk.tokenize import word_tokenizesentences = [    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",    "Professor Plum has a green plant in his study.",    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]sentences_tokenized = []for s in sentences:    sentences_tokenized.append(word_tokenize(s))

sentences_tokenized is a list of a list of tokens:

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]

The sentences were taken from the example ipython notebook accompanying the book "Mining the Social Web, 2nd Edition"