Correctly reading text from Windows-1252(cp1252) file in python
CP1252 cannot represent ā; your input contains the similar character â. repr
just displays an ASCII representation of a unicode string in Python 2.x:
>>> print(repr(b'J\xe2nis'.decode('cp1252')))u'J\xe2nis'>>> print(b'J\xe2nis'.decode('cp1252'))Jânis
I think u'J\xe2nis'
is correct, see:
>>> print u'J\xe2nis'.encode('utf-8')Jânis
Are you getting actual errors from SQLAlchemy or in your application's output?
I had the same problem with some XML files, I solved reading the file with ANSI encoding (Windows-1252) and writing a file with UTF-8 encoding:
import osimport syspath = os.path.dirname(__file__)file_name = 'my_input_file.xml'if __name__ == "__main__": with open(os.path.join(path, './' + file_name), 'r', encoding='cp1252') as f1: lines = f1.read() f2 = open(os.path.join(path, './' + 'my_output_file.xml'), 'w', encoding='utf-8') f2.write(lines) f2.close()