Python how to read N number of lines at a time Python how to read N number of lines at a time python python

Python how to read N number of lines at a time


islice() can be used to get the next n items of an iterator. Thus, list(islice(f, n)) will return a list of the next n lines of the file f. Using this inside a loop will give you the file in chunks of n lines. At the end of the file, the list might be shorter, and finally the call will return an empty list.

from itertools import islicewith open(...) as f:    while True:        next_n_lines = list(islice(f, n))        if not next_n_lines:            break        # process next_n_lines

An alternative is to use the grouper pattern:

with open(...) as f:    for next_n_lines in izip_longest(*[f] * n):        # process next_n_lines


The question appears to presume that there is efficiency to be gained by reading an "enormous textfile" in blocks of N lines at a time. This adds an application layer of buffering over the already highly optimized stdio library, adds complexity, and probably buys you absolutely nothing.

Thus:

with open('my_very_large_text_file') as f:    for line in f:        process(line)

is probably superior to any alternative in time, space, complexity and readability.

See also Rob Pike's first two rules, Jackson's Two Rules, and PEP-20 The Zen of Python. If you really just wanted to play with islice you should have left out the large file stuff.


Here is another way using groupby:

from itertools import count, groupbyN = 16with open('test') as f:    for g, group in groupby(f, key=lambda _, c=count(): c.next()/N):        print list(group)

How it works:

Basically groupby() will group the lines by the return value of the key parameter and the key parameter is the lambda function lambda _, c=count(): c.next()/N and using the fact that the c argument will be bound to count() when the function will be defined so each time groupby() will call the lambda function and evaluate the return value to determine the grouper that will group the lines so :

# 1 iteration.c.next() => 00 / 16 => 0# 2 iteration.c.next() => 11 / 16 => 0...# Start of the second grouper.c.next() => 1616/16 => 1   ...