str encoding from latin-1 to utf-8 arbitrarily str encoding from latin-1 to utf-8 arbitrarily python-3.x python-3.x

str encoding from latin-1 to utf-8 arbitrarily


In Python3 the standard string is utf-8 so there is no encoding like in python2. The problem with requests attempts to auto-encode the data for transfer. And fallback is latin1 (or perhaps just first 127 characters of it). In order to give requests enough information, you should encode it.

headers = {'Content-Type': 'text/text; charset=utf-8'}requests.post(url,data = text.encode('utf-8'), headers = headers)


According to the error message at the start of the post, (a) you have a unicode string (which contains among other characters the character \u2013) and (b) you are trying to encode it as Latin-1. (a) is good. (b) is bad, you should encode it as utf-8.

So, what you need to send is

input_data.encode('utf-8')

There seems to be a problem also with unwanted or bogus input. This is NOT something that you can fix by fiddling with encodings. You probably need to maintain a dictionary of deletions and substitutions. This requires management assistance to get off the ground. It needs to be done on first input to a database.

By the way, data encoded in Latin-1 doesn't exist in the real world, if you need to work on legacy data, decode using windows-1252 or similar instead of latin1.


i am far from a python expert but : str('yadayada').encode('utf-8).decode('utf-8)contains syntax errors,

str('yadayada').encode('utf-8').decode('utf-8') ==mind the closing ' <== works fine