Reshaping a pandas correlation matrix Reshaping a pandas correlation matrix pandas pandas

Reshaping a pandas correlation matrix


You need add reset_index:

#reset columns and index names df = df.rename_axis(None).rename_axis(None, axis=1)#if pandas version below 0.18.0#df.columns.name = None#df.index.name = Noneprint (df)          aaa       bbb       ccc       ddd       eeeaaa  1.000000  0.346099  0.131874 -0.150910  0.177589bbb  0.346099  1.000000  0.177308 -0.384893  0.301150ccc  0.131874  0.177308  1.000000 -0.176995  0.258812ddd -0.150910 -0.384893 -0.176995  1.000000 -0.310137eee  0.177589  0.301150  0.258812 -0.310137  1.000000
df1 = df.stack().reset_index()#set column namesdf1.columns = ['a','b','c']print (df1)      a    b         c0   aaa  aaa  1.0000001   aaa  bbb  0.3460992   aaa  ccc  0.1318743   aaa  ddd -0.1509104   aaa  eee  0.1775895   bbb  aaa  0.3460996   bbb  bbb  1.0000007   bbb  ccc  0.1773088   bbb  ddd -0.3848939   bbb  eee  0.30115010  ccc  aaa  0.13187411  ccc  bbb  0.17730812  ccc  ccc  1.00000013  ccc  ddd -0.17699514  ccc  eee  0.25881215  ddd  aaa -0.15091016  ddd  bbb -0.38489317  ddd  ccc -0.17699518  ddd  ddd  1.00000019  ddd  eee -0.31013720  eee  aaa  0.17758921  eee  bbb  0.30115022  eee  ccc  0.25881223  eee  ddd -0.31013724  eee  eee  1.000000


Use the code below to (a) reshape the correlation matrix, (b) remove duplicate rows (e.g., {aaa, bbb} and {bbb, aaa}), and (c) remove rows that contain the same variable in the first two columns (e.g., {aaa, aaa}):

# calculate the correlation matrix and reshapedf_corr = df.corr().stack().reset_index()# rename the columnsdf_corr.columns = ['FEATURE_1', 'FEATURE_2', 'CORRELATION']# create a mask to identify rows with duplicate features as mentioned abovemask_dups = (df_corr[['FEATURE_1', 'FEATURE_2']].apply(frozenset, axis=1).duplicated()) | (df_corr['FEATURE_1']==df_corr['FEATURE_2']) # apply the mask to clean the correlation dataframedf_corr = df_corr[~mask_dups]

This will generate an output like this:

    FEATURE_1  FEATURE_2  CORRELATION0         aaa        bbb     0.3460991         aaa        ccc     0.1318742         aaa        ddd    -0.1509103         aaa        eee     0.1775894         bbb        ccc     0.1773085         bbb        ddd    -0.3848936         bbb        eee     0.3011507         ccc        ddd    -0.1769958         ccc        eee     0.2588129         ddd        eee    -0.310137


An one-line solution:

df.corr().stack().rename_axis(('a', 'b')).reset_index(name='value')