Efficiently Read last 'n' rows of CSV into DataFrame

I don't think pandas offers a way to do this in read_csv.

Perhaps the neatest (in one pass) is to use collections.deque:

from collections import dequefrom StringIO import StringIOwith open(fname, 'r') as f:    q = deque(f, 2)  # replace 2 with n (lines read at the end)In [12]: qOut[12]: deque(['7,8,9\n', '10,11,12'], maxlen=2)         # these are the last two lines of my csvIn [13]: pd.read_csv(StringIO(''.join(q)), header=None)

Another option worth trying is to get the number of lines in a first pass and then read the file again, skip that number of rows (minus n) using read_csv...

python csv numpy pandas

Here's a handy way to do. Works well for what I like to do -

import tailer as tlimport pandas as pdimport iofile = open(fname)lastLines = tl.tail(file,15) #to read last 15 lines, change it  to any value.file.close()df=pd.read_csv(io.StringIO('\n'.join(lastLines)), header=None)

python csv numpy pandas

Files are simply streams of bytes. Lines do not exist as separate entities; they are an artifact of treating certain bytes as newline characters. As such, you must read from the beginning of the file to identify lines in order.

If the file doesn't change (often) and this is an operation you need to perform often (say, with different values of n), you can store the byte offsets of the newline characters in a second file. You can use this much-smaller file and the seek command to quickly jump to a given line in the first file and read from there.

(Some operating systems provide record-oriented files that have more complex internal structure than the common flat file. The above does not apply to them.)

CodeHunter

Efficiently Read last 'n' rows of CSV into DataFrame

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last