Scraping works well until I get this error: 'ascii' codec can't encode character u'\u2122' in position

That is unicode for the Trademarked symbol: http://www.marathon-studios.com/unicode/U2122/Trade_Mark_Sign

Since you're scraping web, you'll likely see a lot more of these types of errors, so replacing it might work for this page, but not others with other symbols.

The csv module is converting your unicode to ascii before writing it. I'd recommend you do the same before giving it the text, and clean it up yourself, that is, instead of

htmlTxt.encode('utf-8')

htmlTxt.encode('ascii', 'ignore')

And then check out the text to see if it is acceptable for your purposes.

EDIT

Here's my output in Python 3:

>>> u'\u2122'.encode('ascii')Traceback (most recent call last):  File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in position 0: ordinal not in range(128)>>> u'\u2122'.encode('ascii', 'ignore')b''

and Python 2.6:

>>> u'\u2122'.encode('ascii')Traceback (most recent call last):  File "<pyshell#92>", line 1, in <module>    u'\u2122'.encode('ascii')UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)>>> u'\u2122'.encode('ascii', 'ignore')''

python json ascii screen-scraping

The strings in jsonObj will be in unicode type, because Python json module will produce unicode strings. Your csv writer wants everything in str type. In Python 2.7 it will try to automatically convert unicode type to str type assuming ASCII. This will of course fail if the unicode type does not contain ASCII.

The simplest fix would be to change this line:

csvWriter.writerows(jsonObj['vendors'])

to encode the unicode into str in utf8 just before sending to csv writer. jsonObj['vendors'] is a list of dictionaries with unicode keys and values, so we can do this:

unicode_vendors = jsonObj['vendors']str_vendors = []for unicode_dict in unicode_vendors:    str_dict = {}    for key, value in unicode_dict.items():        str_dict[key.encode('utf8')] = value.encode('utf8') if value else value    str_vendors.append(str_dict)csvWriter.writerows(str_vendors)

CodeHunter

Scraping works well until I get this error: 'ascii' codec can't encode character u'\u2122' in position

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last