Correctly reading text from Windows-1252(cp1252) file in python Correctly reading text from Windows-1252(cp1252) file in python python python

Correctly reading text from Windows-1252(cp1252) file in python


CP1252 cannot represent ā; your input contains the similar character â. repr just displays an ASCII representation of a unicode string in Python 2.x:

>>> print(repr(b'J\xe2nis'.decode('cp1252')))u'J\xe2nis'>>> print(b'J\xe2nis'.decode('cp1252'))Jânis


I think u'J\xe2nis' is correct, see:

>>> print u'J\xe2nis'.encode('utf-8')Jânis

Are you getting actual errors from SQLAlchemy or in your application's output?


I had the same problem with some XML files, I solved reading the file with ANSI encoding (Windows-1252) and writing a file with UTF-8 encoding:

import osimport syspath = os.path.dirname(__file__)file_name = 'my_input_file.xml'if __name__ == "__main__":    with open(os.path.join(path, './' + file_name), 'r', encoding='cp1252') as f1:        lines = f1.read()        f2 = open(os.path.join(path, './' + 'my_output_file.xml'), 'w', encoding='utf-8')        f2.write(lines)        f2.close()