Pandas: How to drop self correlation from correlation matrix
Say you have
corrs = df.corr()
Then the problem is with the diagonal elements, IIUC. You can easily set them to some negative value, say -2 (which will necessarily be lower than all correlations) with
np.fill_diagonal(corrs.values, -2)
Example
(Many thanks to @Fabian Rost for the improvement & @jezrael for the DataFrame)
import numpy as npdf=pd.DataFrame( { 'one':[0.1, .32, .2, 0.4, 0.8], 'two':[.23, .18, .56, .61, .12], 'three':[.9, .3, .6, .5, .3], 'four':[.34, .75, .91, .19, .21], 'zive': [0.1, .32, .2, 0.4, 0.8], 'six':[.9, .3, .6, .5, .3], 'drive':[.9, .3, .6, .5, .3]})corrs = df.corr()np.fill_diagonal(corrs.values, -2)>>> corrs drive four one six three two zivedrive -2.000000 -0.039607 -0.747365 1.000000 1.000000 0.238102 -0.747365four -0.039607 -2.000000 -0.489177 -0.039607 -0.039607 0.159583 -0.489177one -0.747365 -0.489177 -2.000000 -0.747365 -0.747365 -0.351531 1.000000six 1.000000 -0.039607 -0.747365 -2.000000 1.000000 0.238102 -0.747365three 1.000000 -0.039607 -0.747365 1.000000 -2.000000 0.238102 -0.747365two 0.238102 0.159583 -0.351531 0.238102 0.238102 -2.000000 -0.351531zive -0.747365 -0.489177 1.000000 -0.747365 -0.747365 -0.351531 -2.000000
I recently found even cleaner answer to my question, you can compare multi-index levels by value.
This is what I ended using.
corr = df.corr().stack()corr = corr[corr.index.get_level_values(0) != corr.index.get_level_values(1)]