# How do I read CSV data into a record array in NumPy?

You can use Numpy's `genfromtxt()`

method to do so, by setting the `delimiter`

kwarg to a comma.

`from numpy import genfromtxtmy_data = genfromtxt('my_file.csv', delimiter=',')`

More information on the function can be found at its respective documentation.

I would recommend the `read_csv`

function from the `pandas`

library:

`import pandas as pddf=pd.read_csv('myfile.csv', sep=',',header=None)df.valuesarray([[ 1. , 2. , 3. ], [ 4. , 5.5, 6. ]])`

This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...

I would also recommend `genfromtxt`

. However, since the question asks for a record array, as opposed to a normal array, the `dtype=None`

parameter needs to be added to the `genfromtxt`

call:

Given an input file, `myfile.csv`

:

`1.0, 2, 34, 5.5, 6import numpy as npnp.genfromtxt('myfile.csv',delimiter=',')`

gives an array:

`array([[ 1. , 2. , 3. ], [ 4. , 5.5, 6. ]])`

and

`np.genfromtxt('myfile.csv',delimiter=',',dtype=None)`

gives a record array:

`array([(1.0, 2.0, 3), (4.0, 5.5, 6)], dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])`

This has the advantage that file with multiple data types (including strings) can be easily imported.

I timed the

`from numpy import genfromtxtgenfromtxt(fname = dest_file, dtype = (<whatever options>))`

versus

`import csvimport numpy as npwith open(dest_file,'r') as dest_f: data_iter = csv.reader(dest_f, delimiter = delimiter, quotechar = '"') data = [data for data in data_iter]data_array = np.asarray(data, dtype = <whatever options>)`

on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.

I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.