Quickest ways to read large files with varying number columns in Python
numpy.loadtxt
would have been perfect here doesn't apply here because the number of columns change.
You want a flat list, you could speed it up a bit by using a list comprehension:
from numpy import *with open("file.txt") as f: data = array([float(x) for l in f for x in l.split()])
(Now I'm pretty sure it will be much faster considering the mistake that JH pointed out in his answer: data = data + line
creates a new list each time: quadratic complexity. You avoid that with the list comprehesion)
Pandas is much better/faster at handling ragged columns than numpy is, and should be faster than a vanilla python implementation with a loop.
Use read_csv
, followed by stack
, and then access the values
attribute to return a numpy
array.
max_per_row = 10 # set this to the max possible number of elements in a rowvals = pd.read_csv(buf, header=None, names=range(max_per_row), delim_whitespace=True).stack().valuesprint(vals)array([ 3. , 2.5, 1.1, 30.2, 11.5, 5. , 6.2, 12.2, 70.2, 14.7, 3.2, 1.1])