Lambda not supporting NLTK file size
There are two things that you can do:
- The errors seems like the path is not being defined properly, maybe set it as an env Variable?
sys.path.append(os.path.abspath('/var/task/nltk_data/')
or this way
Once you run
nltk.download()
, then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)In the lambda function dashboard (in the AWS console), add
NLTK_DATA
=./nltk_data
as a key-var Environment Variable.
reduce the size of the nltk downloads, since you won't be needing all of them.
Delete all the zip files, keep only the needed section, for example: stopwords. That can be moved into:
save nltk_data/corpora/stopwords
and delete the rest.Or If you need tokenizers save to
nltk_data/tokenizers/punkt
. Most of these can be separately downloaded:python -m nltk.downloader punkt
, then copy over the files.