Character reading from file in Python

Ref: http://docs.python.org/howto/unicode

Reading Unicode from a file is therefore simple:

import codecswith codecs.open('unicode.rst', encoding='utf-8') as f:    for line in f:        print repr(line)

It's also possible to open files in update mode, allowing both reading and writing:

with codecs.open('test', encoding='utf-8', mode='w+') as f:    f.write(u'\u4500 blah blah blah\n')    f.seek(0)    print repr(f.readline()[:1])

EDIT: I'm assuming that your intended goal is just to be able to read the file properly into a string in Python. If you're trying to convert to an ASCII string from Unicode, then there's really no direct way to do so, since the Unicode characters won't necessarily exist in ASCII.

If you're trying to convert to an ASCII string, try one of the following:

Replace the specific unicode chars with ASCII equivalents, if you are only looking to handle a few special cases such as this particular example
Use the unicodedata module's normalize() and the string.encode() method to convert as best you can to the next closest ASCII equivalent (Ref https://web.archive.org/web/20090228203858/http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python):
```
>>> teststru'I don\xe2\x80\x98t like this'>>> unicodedata.normalize('NFKD', teststr).encode('ascii', 'ignore')'I donat like this'
```

python unicode encoding ascii

There are a few points to consider.

A \u2018 character may appear only as a fragment of representation of a unicode string in Python, e.g. if you write:

>>> text = u'‘'>>> print repr(text)u'\u2018'

Now if you simply want to print the unicode string prettily, just use unicode's encode method:

>>> text = u'I don\u2018t like this'>>> print text.encode('utf-8')I don‘t like this

To make sure that every line from any file would be read as unicode, you'd better use the codecs.open function instead of just open, which allows you to specify file's encoding:

>>> import codecs>>> f1 = codecs.open(file1, "r", "utf-8")>>> text = f1.read()>>> print type(text)<type 'unicode'>>>> print text.encode('utf-8')I don‘t like this

python unicode encoding ascii

It is also possible to read an encoded text file using the python 3 read method:

f = open (file.txt, 'r', encoding='utf-8')text = f.read()f.close()

With this variation, there is no need to import any additional libraries

CodeHunter

Character reading from file in Python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last