python - how to compute correlation-matrix with nans in data-matrix python - how to compute correlation-matrix with nans in data-matrix numpy numpy

python - how to compute correlation-matrix with nans in data-matrix


I think the method you are looking for is corr() from pandas. For example, a dataframe as following. You can also refer to this question. How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?

import pandas as pddf = pd.DataFrame({'A': [2, None, 1, -4, None, None, 3],                   'B': [None, 1, None, None, 1, 3, None],                   'C': [2, 1, None, 2, 2.1, 1, 0],                   'D': [-2, 1.1, 3.2, 2, None, 1, None]})df
    A       B       C       D0   2       NaN     2       -21   NaN     1       1       1.12   1       NaN     NaN     3.23   -4      NaN     2       24   NaN     1       2.1     NaN5   NaN     3       1       16   3       NaN     0       NaN
rho = df.corr()rho
       A          B            C           DA   1.000000     NaN       -0.609994    -0.441784B   NaN          1.0       -0.500000    -1.000000C   -0.609994    -0.5       1.000000    -0.347928D   0.041204     -1.0       -0.347928    1.000000