numpy and statsmodels give different values when calculating correlations, How to interpret this?
statsmodels.tsa.stattools.ccf
is based on np.correlate
but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.
For easier reference I copied the relevant lines below:
def ccovf(x, y, unbiased=True, demean=True): n = len(x) if demean: xo = x - x.mean() yo = y - y.mean() else: xo = x yo = y if unbiased: xi = np.ones(n) d = np.correlate(xi, xi, 'full') else: d = n return (np.correlate(xo, yo, 'full') / d)[n - 1:]def ccf(x, y, unbiased=True): cvf = ccovf(x, y, unbiased=unbiased, demean=True) return cvf / (np.std(x) * np.std(y))