How to read a single character at a time from a file in Python?

python file-io character

with open(filename) as f:  while True:    c = f.read(1)    if not c:      print "End of file"      break    print "Read a character:", c

python file-io character

First, open a file:

with open("filename") as fileobj:    for line in fileobj:         for ch in line:            print(ch)

This goes through every line in the file and then every character in that line.

python file-io character

I like the accepted answer: it is straightforward and will get the job done. I would also like to offer an alternative implementation:

def chunks(filename, buffer_size=4096):    """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk    until no more characters can be read; the last chunk will most likely have    less than `buffer_size` bytes.    :param str filename: Path to the file    :param int buffer_size: Buffer size, in bytes (default is 4096)    :return: Yields chunks of `buffer_size` size until exhausting the file    :rtype: str    """    with open(filename, "rb") as fp:        chunk = fp.read(buffer_size)        while chunk:            yield chunk            chunk = fp.read(buffer_size)def chars(filename, buffersize=4096):    """Yields the contents of file `filename` character-by-character. Warning:    will only work for encodings where one character is encoded as one byte.    :param str filename: Path to the file    :param int buffer_size: Buffer size for the underlying chunks,    in bytes (default is 4096)    :return: Yields the contents of `filename` character-by-character.    :rtype: char    """    for chunk in chunks(filename, buffersize):        for char in chunk:            yield chardef main(buffersize, filenames):    """Reads several files character by character and redirects their contents    to `/dev/null`.    """    for filename in filenames:        with open("/dev/null", "wb") as fp:            for char in chars(filename, buffersize):                fp.write(char)if __name__ == "__main__":    # Try reading several files varying the buffer size    import sys    buffersize = int(sys.argv[1])    filenames  = sys.argv[2:]    sys.exit(main(buffersize, filenames))

The code I suggest is essentially the same idea as your accepted answer: read a given number of bytes from the file. The difference is that it first reads a good chunk of data (4006 is a good default for X86, but you may want to try 1024, or 8192; any multiple of your page size), and then it yields the characters in that chunk one by one.

The code I present may be faster for larger files. Take, for example, the entire text of War and Peace, by Tolstoy. These are my timing results (Mac Book Pro using OS X 10.7.4; so.py is the name I gave to the code I pasted):

$ time python so.py 1 2600.txt.utf-8python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total$ time python so.py 4096 2600.txt.utf-8python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total

Now: do not take the buffer size at 4096 as a universal truth; look at the results I get for different sizes (buffer size (bytes) vs wall time (sec)):

   2 2.726    4 1.948    8 1.693   16 1.534   32 1.525   64 1.398  128 1.432  256 1.377  512 1.347 1024 1.442 2048 1.316 4096 1.318

As you can see, you can start seeing gains earlier on (and my timings are likely very inaccurate); the buffer size is a trade-off between performance and memory. The default of 4096 is just a reasonable choice but, as always, measure first.

CodeHunter

How to read a single character at a time from a file in Python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last