How to read a single character at a time from a file in Python? How to read a single character at a time from a file in Python? python python

How to read a single character at a time from a file in Python?


with open(filename) as f:  while True:    c = f.read(1)    if not c:      print "End of file"      break    print "Read a character:", c


First, open a file:

with open("filename") as fileobj:    for line in fileobj:         for ch in line:            print(ch)

This goes through every line in the file and then every character in that line.


I like the accepted answer: it is straightforward and will get the job done. I would also like to offer an alternative implementation:

def chunks(filename, buffer_size=4096):    """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk    until no more characters can be read; the last chunk will most likely have    less than `buffer_size` bytes.    :param str filename: Path to the file    :param int buffer_size: Buffer size, in bytes (default is 4096)    :return: Yields chunks of `buffer_size` size until exhausting the file    :rtype: str    """    with open(filename, "rb") as fp:        chunk = fp.read(buffer_size)        while chunk:            yield chunk            chunk = fp.read(buffer_size)def chars(filename, buffersize=4096):    """Yields the contents of file `filename` character-by-character. Warning:    will only work for encodings where one character is encoded as one byte.    :param str filename: Path to the file    :param int buffer_size: Buffer size for the underlying chunks,    in bytes (default is 4096)    :return: Yields the contents of `filename` character-by-character.    :rtype: char    """    for chunk in chunks(filename, buffersize):        for char in chunk:            yield chardef main(buffersize, filenames):    """Reads several files character by character and redirects their contents    to `/dev/null`.    """    for filename in filenames:        with open("/dev/null", "wb") as fp:            for char in chars(filename, buffersize):                fp.write(char)if __name__ == "__main__":    # Try reading several files varying the buffer size    import sys    buffersize = int(sys.argv[1])    filenames  = sys.argv[2:]    sys.exit(main(buffersize, filenames))

The code I suggest is essentially the same idea as your accepted answer: read a given number of bytes from the file. The difference is that it first reads a good chunk of data (4006 is a good default for X86, but you may want to try 1024, or 8192; any multiple of your page size), and then it yields the characters in that chunk one by one.

The code I present may be faster for larger files. Take, for example, the entire text of War and Peace, by Tolstoy. These are my timing results (Mac Book Pro using OS X 10.7.4; so.py is the name I gave to the code I pasted):

$ time python so.py 1 2600.txt.utf-8python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total$ time python so.py 4096 2600.txt.utf-8python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total

Now: do not take the buffer size at 4096 as a universal truth; look at the results I get for different sizes (buffer size (bytes) vs wall time (sec)):

   2 2.726    4 1.948    8 1.693   16 1.534   32 1.525   64 1.398  128 1.432  256 1.377  512 1.347 1024 1.442 2048 1.316 4096 1.318 

As you can see, you can start seeing gains earlier on (and my timings are likely very inaccurate); the buffer size is a trade-off between performance and memory. The default of 4096 is just a reasonable choice but, as always, measure first.