(Python) Counting lines in a huge (>10GB) file as fast as possible [duplicate] (Python) Counting lines in a huge (>10GB) file as fast as possible [duplicate] python python

(Python) Counting lines in a huge (>10GB) file as fast as possible [duplicate]


Ignacio's answer is correct, but might fail if you have a 32 bit process.

But maybe it could be useful to read the file block-wise and then count the \n characters in each block.

def blocks(files, size=65536):    while True:        b = files.read(size)        if not b: break        yield bwith open("file", "r") as f:    print sum(bl.count("\n") for bl in blocks(f))

will do your job.

Note that I don't open the file as binary, so the \r\n will be converted to \n, making the counting more reliable.

For Python 3, and to make it more robust, for reading files with all kinds of characters:

def blocks(files, size=65536):    while True:        b = files.read(size)        if not b: break        yield bwith open("file", "r",encoding="utf-8",errors='ignore') as f:    print (sum(bl.count("\n") for bl in blocks(f)))


I know its a bit unfair but you could do this

int(subprocess.check_output("wc -l C:\\alarm.bat").split()[0])

If you're on Windows, check out Coreutils.


An fast, 1-line solution is:

sum(1 for i in open(file_path, 'rb'))

It should work on files of arbitrary size.