Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers? Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers? numpy numpy

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?


You need to specify data, index and columns to DataFrame constructor, as in:

>>> pd.DataFrame(data=data[1:,1:],    # values...              index=data[1:,0],    # 1st column as index...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.


Here is an easy to understand solution

import numpy as npimport pandas as pd# Creating a 2 dimensional numpy array>>> data = np.array([[5.8, 2.8], [6.0, 2.2]])>>> print(data)>>> dataarray([[5.8, 2.8],       [6. , 2.2]])# Creating pandas dataframe from numpy array>>> dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})>>> print(dataset)   Column1  Column20      5.8      2.81      6.0      2.2


I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying "option 2" from this great answer, you could do it like this:

import pandasimport numpydtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]values = numpy.zeros(20, dtype=dtype)index = ['Row'+str(i) for i in range(1, len(values)+1)]df = pandas.DataFrame(values, index=index)