When should I ever use file.read() or file.readlines()? When should I ever use file.read() or file.readlines()? python python

When should I ever use file.read() or file.readlines()?


The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read() reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.

f.readline() reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline() also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.

Using the syntax for line in f: allows the user to iterate over the file line by line as noted in the question.

(As noted in the other answer, this documentation is a very good read):

https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

Note:It was previously claimed that f.readline() could be used to skip a line during a for loop iteration. However, this doesn't work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.


Hope this helps!

https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory

Sorry for all the edits!

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

for line in f:    print line,This is the first line of the file.Second line of the file


Note that readline() is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.

I ran timeit on two identical snippts but one with for-loop and the other with readlines(). You can see my snippet below:

  def test_read_file_1():      f = open('ml/README.md', 'r')      for line in f.readlines():          print(line)      def test_read_file_2():      f = open('ml/README.md', 'r')      for line in f:          print(line)      def test_time_read_file():      from timeit import timeit        duration_1 = timeit(lambda: test_read_file_1(), number=1000000)      duration_2 = timeit(lambda: test_read_file_2(), number=1000000)        print('duration using readlines():', duration_1)      print('duration using for-loop:', duration_2)

And the results:

duration using readlines(): 78.826229238duration using for-loop: 69.487692794

The bottomline, I would say, for-loop is faster but in case of possibility of both, I'd rather readlines().