When should I ever use file.read() or file.readlines()?
The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read()
reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.
f.readline()
reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline()
also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.
Using the syntax for line in f:
allows the user to iterate over the file line by line as noted in the question.
(As noted in the other answer, this documentation is a very good read):
https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
Note:It was previously claimed that f.readline()
could be used to skip a line during a for loop iteration. However, this doesn't work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.
Hope this helps!
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory
Sorry for all the edits!
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
for line in f: print line,This is the first line of the file.Second line of the file
Note that readline()
is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.
I ran timeit
on two identical snippts but one with for-loop and the other with readlines()
. You can see my snippet below:
def test_read_file_1(): f = open('ml/README.md', 'r') for line in f.readlines(): print(line) def test_read_file_2(): f = open('ml/README.md', 'r') for line in f: print(line) def test_time_read_file(): from timeit import timeit duration_1 = timeit(lambda: test_read_file_1(), number=1000000) duration_2 = timeit(lambda: test_read_file_2(), number=1000000) print('duration using readlines():', duration_1) print('duration using for-loop:', duration_2)
And the results:
duration using readlines(): 78.826229238duration using for-loop: 69.487692794
The bottomline, I would say, for-loop is faster but in case of possibility of both, I'd rather readlines()
.