Lazy Method for Reading Big File in Python?

To write a lazy function, just use yield:

def read_in_chunks(file_object, chunk_size=1024):    """Lazy function (generator) to read a file piece by piece.    Default chunk size: 1k."""    while True:        data = file_object.read(chunk_size)        if not data:            break        yield datawith open('really_big_file.dat') as f:    for piece in read_in_chunks(f):        process_data(piece)

Another option would be to use iter and a helper function:

f = open('really_big_file.dat')def read1k():    return f.read(1024)for piece in iter(read1k, ''):    process_data(piece)

If the file is line-based, the file object is already a lazy generator of lines:

for line in open('really_big_file.dat'):    process_data(line)

python file-io generator

If your computer, OS and python are 64-bit, then you can use the mmap module to map the contents of the file into memory and access it with indices and slices. Here an example from the documentation:

import mmapwith open("hello.txt", "r+") as f:    # memory-map the file, size 0 means whole file    map = mmap.mmap(f.fileno(), 0)    # read content via standard file methods    print map.readline()  # prints "Hello Python!"    # read content via slice notation    print map[:5]  # prints "Hello"    # update content using slice notation;    # note that new content must have same size    map[6:] = " world!\n"    # ... and read again using standard file methods    map.seek(0)    print map.readline()  # prints "Hello  world!"    # close the map    map.close()

If either your computer, OS or python are 32-bit, then mmap-ing large files can reserve large parts of your address space and starve your program of memory.

python file-io generator

file.readlines() takes in an optional size argument which approximates the number of lines read in the lines returned.

bigfile = open('bigfilename','r')tmp_lines = bigfile.readlines(BUF_SIZE)while tmp_lines:    process([line for line in tmp_lines])    tmp_lines = bigfile.readlines(BUF_SIZE)

CodeHunter

Lazy Method for Reading Big File in Python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last