How do I read CSV data into a record array in NumPy?
I would recommend the
read_csv function from the
import pandas as pddf=pd.read_csv('myfile.csv', sep=',',header=None)df.valuesarray([[ 1. , 2. , 3. ], [ 4. , 5.5, 6. ]])
This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays.
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...
I would also recommend
genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the
dtype=None parameter needs to be added to the
Given an input file,
1.0, 2, 34, 5.5, 6import numpy as npnp.genfromtxt('myfile.csv',delimiter=',')
gives an array:
array([[ 1. , 2. , 3. ], [ 4. , 5.5, 6. ]])
gives a record array:
array([(1.0, 2.0, 3), (4.0, 5.5, 6)], dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])
This has the advantage that file with multiple data types (including strings) can be easily imported.
I timed the
from numpy import genfromtxtgenfromtxt(fname = dest_file, dtype = (<whatever options>))
import csvimport numpy as npwith open(dest_file,'r') as dest_f: data_iter = csv.reader(dest_f, delimiter = delimiter, quotechar = '"') data = [data for data in data_iter]data_array = np.asarray(data, dtype = <whatever options>)
on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.
I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.