How do I read CSV data into a record array in NumPy?

You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

from numpy import genfromtxtmy_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.

python numpy scipy genfromtxt

I would recommend the read_csv function from the pandas library:

import pandas as pddf=pd.read_csv('myfile.csv', sep=',',header=None)df.valuesarray([[ 1. ,  2. ,  3. ],       [ 4. ,  5.5,  6. ]])

This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...

I would also recommend genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the dtype=None parameter needs to be added to the genfromtxt call:

Given an input file, myfile.csv:

1.0, 2, 34, 5.5, 6import numpy as npnp.genfromtxt('myfile.csv',delimiter=',')

gives an array:

array([[ 1. ,  2. ,  3. ],       [ 4. ,  5.5,  6. ]])

and

np.genfromtxt('myfile.csv',delimiter=',',dtype=None)

gives a record array:

array([(1.0, 2.0, 3), (4.0, 5.5, 6)],       dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

This has the advantage that file with multiple data types (including strings) can be easily imported.

python numpy scipy genfromtxt

I timed the

from numpy import genfromtxtgenfromtxt(fname = dest_file, dtype = (<whatever options>))

versus

import csvimport numpy as npwith open(dest_file,'r') as dest_f:    data_iter = csv.reader(dest_f,                           delimiter = delimiter,                           quotechar = '"')    data = [data for data in data_iter]data_array = np.asarray(data, dtype = <whatever options>)

on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.

I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.

CodeHunter

How do I read CSV data into a record array in NumPy?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last