load csv into 2D matrix with numpy for plotting
Pure numpy
numpy.loadtxt(open("test.csv", "rb"), delimiter=",", skiprows=1)
Check out the loadtxt documentation.
You can also use python's csv module:
import csvimport numpyreader = csv.reader(open("test.csv", "rb"), delimiter=",")x = list(reader)result = numpy.array(x).astype("float")
You will have to convert it to your favorite numeric type. I guess you can write the whole thing in one line:
result = numpy.array(list(csv.reader(open("test.csv", "rb"), delimiter=","))).astype("float")
Added Hint:
You could also use pandas.io.parsers.read_csv
and get the associated numpy
array which can be faster.
I think using dtype
where there is a name row is confusing the routine. Try
>>> r = np.genfromtxt(fname, delimiter=',', names=True)>>> rarray([[ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29111196e+12], [ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29111311e+12], [ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03, 8.64075140e+02, 1.71537476e+03, 7.65227770e+02, 1.29112065e+12]])>>> r[:,0] # Slice 0'th columnarray([ 611.88243, 611.88243, 611.88243])
You can read a CSV file with headers into a NumPy structured array with np.genfromtxt. For example:
import numpy as npcsv_fname = 'file.csv'with open(csv_fname, 'w') as fp: fp.write("""\"A","B","C","D","E","F","timestamp"611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12""")# Read the CSV file into a Numpy record arrayr = np.genfromtxt(csv_fname, delimiter=',', names=True, case_sensitive=True)print(repr(r))
which looks like this:
array([(611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111196e+12), (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111311e+12), (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29112065e+12)], dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('timestamp', '<f8')])
You can access a named column like this r['E']
:
array([1715.37476, 1715.37476, 1715.37476])
Note: this answer previously used np.recfromcsv to read the data into a NumPy record array. While there was nothing wrong with that method, structured arrays are generally better than record arrays for speed and compatibility.