Quickest ways to read large files with varying number columns in Python

After a few thousand rows, this is doing tons of extra work:

    data = data + cline

Just data.extend(cline). (Or .append(), if you want to know which numbers appeared together on a line.)

Consider storing doubles instead of text:

    data.extend([float(c) for c in line.split()])

python file pandas numpy

numpy.loadtxt would have been perfect here doesn't apply here because the number of columns change.

You want a flat list, you could speed it up a bit by using a list comprehension:

from numpy import *with open("file.txt") as f:    data = array([float(x) for l in f for x in l.split()])

(Now I'm pretty sure it will be much faster considering the mistake that JH pointed out in his answer: data = data + line creates a new list each time: quadratic complexity. You avoid that with the list comprehesion)

python file pandas numpy

Pandas is much better/faster at handling ragged columns than numpy is, and should be faster than a vanilla python implementation with a loop.

Use read_csv, followed by stack, and then access the values attribute to return a numpy array.

max_per_row = 10 # set this to the max possible number of elements in a rowvals = pd.read_csv(buf, header=None, names=range(max_per_row),                             delim_whitespace=True).stack().valuesprint(vals)array([  3. ,   2.5,   1.1,  30.2,  11.5,   5. ,   6.2,  12.2,  70.2,        14.7,   3.2,   1.1])

CodeHunter

Quickest ways to read large files with varying number columns in Python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last