python - how to compute correlation-matrix with nans in data-matrix
I think the method you are looking for is corr()
from pandas. For example, a dataframe as following. You can also refer to this question. How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?
import pandas as pddf = pd.DataFrame({'A': [2, None, 1, -4, None, None, 3], 'B': [None, 1, None, None, 1, 3, None], 'C': [2, 1, None, 2, 2.1, 1, 0], 'D': [-2, 1.1, 3.2, 2, None, 1, None]})df
A B C D0 2 NaN 2 -21 NaN 1 1 1.12 1 NaN NaN 3.23 -4 NaN 2 24 NaN 1 2.1 NaN5 NaN 3 1 16 3 NaN 0 NaN
rho = df.corr()rho
A B C DA 1.000000 NaN -0.609994 -0.441784B NaN 1.0 -0.500000 -1.000000C -0.609994 -0.5 1.000000 -0.347928D 0.041204 -1.0 -0.347928 1.000000