How to create pandas dataframes with more than 2 dimensions?

arrays numpy pandas multidimensional-array dataframe

Rather than using an n-dimensional Panel, you are probably better off using a two dimensional representation of data, but using MultiIndexes for the index, column or both.

For example:

np.random.seed(1618033)#Set 3 axis labels/dimsyears = np.arange(2000,2010) #Yearssamples = np.arange(0,20) #Samplespatients = np.array(["patient_%d" % i for i in range(0,3)]) #Patients#Create random 3D array to simulate data from dims aboveA_3D = np.random.random((years.size, samples.size, len(patients))) #(10, 20, 3)# Create the MultiIndex from years, samples and patients.midx = pd.MultiIndex.from_product([years, samples, patients])# Create sample data for each patient, and add the MultiIndex.patient_data = pd.DataFrame(np.random.randn(len(midx), 3), index = midx)>>> patient_data.head()                         0         1         22000 0 patient_0 -0.128005  0.371413 -0.078591       patient_1 -0.378728 -2.003226 -0.024424       patient_2  1.339083  0.408708  1.724094     1 patient_0 -0.997879 -0.251789 -0.976275       patient_1  0.131380 -0.901092  1.456144

Once you have data in this form, it is relatively easy to juggle it around. For example:

>>> patient_data.unstack(level=0).head()  # Years.                    0                                                                                              ...            2                                                                                                           2000      2001      2002      2003      2004      2005      2006      2007      2008      2009    ...         2000      2001      2002      2003      2004      2005      2006      2007      2008      20090 patient_0 -0.128005  0.051558  1.251120  0.666061 -1.048103  0.259231  1.535370  0.156281 -0.609149  0.360219    ...    -0.078591 -2.305314 -2.253770  0.865997  0.458720  1.479144 -0.214834 -0.791904  0.800452  0.235016  patient_1 -0.378728 -0.117470 -0.306892  0.810256  2.702960 -0.748132 -1.449984 -0.195038  1.151445  0.301487    ...    -0.024424  0.114843  0.143700  1.732072  0.602326  1.465946 -1.215020  0.648420  0.844932 -1.261558  patient_2  1.339083 -0.915771  0.246077  0.820608 -0.935617 -0.449514 -1.105256 -0.051772 -0.671971  0.213349    ...     1.724094  0.835418  0.000819  1.149556 -0.318513 -0.450519 -0.694412 -1.535343  1.035295  0.6277571 patient_0 -0.997879 -0.242597  1.028464  2.093807  1.380361  0.691210 -2.420800  1.593001  0.925579  0.540447    ...    -0.976275  1.928454 -0.626332 -0.049824 -0.912860  0.225834  0.277991  0.326982 -0.520260  0.788685  patient_1  0.131380  0.398155 -1.671873 -1.329554 -0.298208 -0.525148  0.897745 -0.125233 -0.450068 -0.688240    ...     1.456144 -0.503815 -1.329334  0.475751 -0.201466  0.604806 -0.640869 -1.381123  0.524899  0.041983

In order to select the data, please refere to the docs for MultiIndexing.

arrays numpy pandas multidimensional-array dataframe

An alternative approach (to Alexander) that is derived from the structure of the input data is:

np.random.seed(1618033)#Set 3 axis labels/dimsyears = np.arange(2000,2010) #Yearssamples = np.arange(0,20) #Samplespatients = np.array(["patient_%d" % i for i in range(0,3)]) #Patients#Create random 3D array to simulate data from dims aboveA_3D = np.random.random((years.size, samples.size, len(patients))) #(10, 20, 3)# Reshape data to 2 dimensionsmaj_dim = 1for dim in A_3D.shape[:-1]:    maj_dim = maj_dim*dimnew_dims = (maj_dim, A_3D.shape[-1])A_3D = A_3D.reshape(new_dims)# Create the MultiIndex from years, samples and patients.midx = pd.MultiIndex.from_product([years, samples])# Note that Cartesian product order is the same as the # C-order used by default in ``reshape``.# Create sample data for each patient, and add the MultiIndex.patient_data = pd.DataFrame(data = A_3D,                             index = midx,                            columns = patients)>>>> patient_data.head()        patient_0  patient_1  patient_22000 0   0.727753   0.154701   0.205916     1   0.796355   0.597207   0.897153     2   0.603955   0.469707   0.580368     3   0.365432   0.852758   0.293725     4   0.906906   0.355509   0.994513

arrays numpy pandas multidimensional-array dataframe

You should consider using xarray instead. From their documentation:

Panel, pandas’ data structure for 3D arrays, was always a second class data structure compared to the Series and DataFrame. To allow pandas developers to focus more on its core functionality built around the DataFrame, pandas removed Panel in favor of directing users who use multi-dimensional arrays to xarray.

CodeHunter

How to create pandas dataframes with more than 2 dimensions?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last