pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

python python-2.7 hadoop pyspark pickle

It's to do with uploading of stop words module. As a work around import stopwords library with in the function itself. please see the similar issue linked below.I had the same issue and this work around fixed the problem.

    def stopwords_delete(word_list):        from nltk.corpus import stopwords        filtered_words=[]        print word_list

Similar Issue

I would recommend from pyspark.ml.feature import StopWordsRemover as permanent fix.

python python-2.7 hadoop pyspark pickle

Probably, it's just because you are defining the stopwords.words('english') every time on the executor. Define it outside and this would work.

python python-2.7 hadoop pyspark pickle

You are using map over a rdd which has only one row and each word as a column.so, the entire row of rdd which is of type is passed to stopwords_delete fuction and in the for loop within that, is trying to match rdd to stopwords and it fails.Try like this,

filtered_words=stopwords_delete(wordlist.flatMap(lambda x:x).collect())print(filtered_words)

I got this output as filtered_words,

["shan't", "she'd", 'fuck', 'world', "who's"]

Also, include a return in your function.

Another way, you could use list comprehension to replace the stopwords_delete fuction,

filtered_words = wordlist.flatMap(lambda x:[i for i in x if i not in stopwords.words('english')]).collect()

CodeHunter

pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

pickle.PicklingError: args[0] from __newobj__ args has the wrong class with hadoop python

Recent Posts

pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python