Pandas: How to drop self correlation from correlation matrix Pandas: How to drop self correlation from correlation matrix numpy numpy

Pandas: How to drop self correlation from correlation matrix


Say you have

corrs = df.corr()

Then the problem is with the diagonal elements, IIUC. You can easily set them to some negative value, say -2 (which will necessarily be lower than all correlations) with

np.fill_diagonal(corrs.values, -2)

Example

(Many thanks to @Fabian Rost for the improvement & @jezrael for the DataFrame)

import numpy as npdf=pd.DataFrame( {    'one':[0.1, .32, .2, 0.4, 0.8],     'two':[.23, .18, .56, .61, .12],     'three':[.9, .3, .6, .5, .3],     'four':[.34, .75, .91, .19, .21],     'zive': [0.1, .32, .2, 0.4, 0.8],     'six':[.9, .3, .6, .5, .3],    'drive':[.9, .3, .6, .5, .3]})corrs = df.corr()np.fill_diagonal(corrs.values, -2)>>> corrs    drive   four    one six three   two zivedrive   -2.000000   -0.039607   -0.747365   1.000000    1.000000    0.238102    -0.747365four    -0.039607   -2.000000   -0.489177   -0.039607   -0.039607   0.159583    -0.489177one -0.747365   -0.489177   -2.000000   -0.747365   -0.747365   -0.351531   1.000000six 1.000000    -0.039607   -0.747365   -2.000000   1.000000    0.238102    -0.747365three   1.000000    -0.039607   -0.747365   1.000000    -2.000000   0.238102    -0.747365two 0.238102    0.159583    -0.351531   0.238102    0.238102    -2.000000   -0.351531zive    -0.747365   -0.489177   1.000000    -0.747365   -0.747365   -0.351531   -2.000000


I recently found even cleaner answer to my question, you can compare multi-index levels by value.

This is what I ended using.

corr = df.corr().stack()corr = corr[corr.index.get_level_values(0) != corr.index.get_level_values(1)]