Efficiently counting the number of lines of a text file. (200mb+) Efficiently counting the number of lines of a text file. (200mb+) php php

Efficiently counting the number of lines of a text file. (200mb+)


This will use less memory, since it doesn't load the whole file into memory:

$file="largefile.txt";$linecount = 0;$handle = fopen($file, "r");while(!feof($handle)){  $line = fgets($handle);  $linecount++;}fclose($handle);echo $linecount;

fgets loads a single line into memory (if the second argument $length is omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.

The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:

$file="largefile.txt";$linecount = 0;$handle = fopen($file, "r");while(!feof($handle)){  $line = fgets($handle, 4096);  $linecount = $linecount + substr_count($line, PHP_EOL);}fclose($handle);echo $linecount;


Using a loop of fgets() calls is fine solution and the most straightforward to write, however:

  1. even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.

  2. it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.

This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.

function getLines($file){    $f = fopen($file, 'rb');    $lines = 0;    while (!feof($f)) {        $lines += substr_count(fread($f, 8192), "\n");    }    fclose($f);    return $lines;}

If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.

Benchmark

I ran a test with a 1GB file; here are the results:

             +-------------+------------------+---------+             | This answer | Dominic's answer | wc -l   |+------------+-------------+------------------+---------+| Lines      | 3550388     | 3550389          | 3550388 |+------------+-------------+------------------+---------+| Runtime    | 1.055       | 4.297            | 0.587   |+------------+-------------+------------------+---------+

Time is measured in seconds real time, see here what real means


Simple Oriented Object solution

$file = new \SplFileObject('file.extension');while($file->valid()) $file->fgets();var_dump($file->key());

#Update

Another way to make this is with PHP_INT_MAX in SplFileObject::seek method.

$file = new \SplFileObject('file.extension', 'r');$file->seek(PHP_INT_MAX);echo $file->key();