Python - Map / Reduce - How do I read JSON specific field in using DISCO count words example Python - Map / Reduce - How do I read JSON specific field in using DISCO count words example json json

Python - Map / Reduce - How do I read JSON specific field in using DISCO count words example


Your problem is in disco/worker/classic/func.py... str() will not accept a unicode character...

>>> str(u'\xb4')Traceback (most recent call last):  File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' in position 0: ordinal not in range(128)>>>

Since you are only counting words, you could convert your unicode data into strings with the unicodedata module...

import jsonimport unicodedataf = open('file.json')for line in f:    r = json.loads(line).get('text')    s = unicodedata.normalize('NFD', r).encode('ascii', 'ignore')    print r    print s

Output:

@CataDuarte8 No! avíseme cuando vaya ah salir para yo salir igual!@CataDuarte8 No! aviseme cuando vaya ah salir para yo salir igual!

Applying this to your problem... rewrite your map() function as...

def map(line, params):    r = simplejson.loads(line).get('text')    s = unicodedata.normalize('NFD', r).encode('ascii', 'ignore')    for word in s.split():        yield word, 1