Efficient 1D linear regression for each element of 3D numpy array Efficient 1D linear regression for each element of 3D numpy array numpy numpy

Efficient 1D linear regression for each element of 3D numpy array


If i understand it correctly, you want do many linear regression y = k * x + b, there is only one x, but many y, for every y you want to calculate the k and b.

If the shape of x is 50, y is (50, 1000), you can use numpy.linalg.lstsq, here is some demo:

import numpy as npx = np.random.rand(50)k = np.random.rand(1000)b = np.random.rand(1000)y = np.outer(x, k) + b + np.random.normal(size=(50, 1000), scale=1e-10)r = np.linalg.lstsq(np.c_[x, np.ones_like(x)], y)[0]print np.allclose(r[0], k)print np.allclose(r[1], b)

for the y with shape (50, 2000, 2000), you can reshape it to (50, 2000*2000).

Edit

Here the my solution for masked array. The formula is:

enter image description here

Prepare Y as 2-dim array with shape (1889*1566, 57), X as 2-dim array with shape (57, 2). mask as a bool array with the same shape as Y, True means the value in Y is available.

Calculate array a with shape (1889*1566, 2, 2), b with shape (1889*1566, 2), then call numpy.linalg.solve(a, b), here is some demo code:

import numpy as npN = 50M = 1000x = np.random.rand(N)X = np.c_[x, np.ones_like(x)]beta = np.random.rand(M, 2)Y = np.dot(beta, X.T)Y += np.random.normal(scale=0.1, size=Y.shape)mask = np.random.randint(0, 2, size=Y.shape).astype(np.bool)a = np.swapaxes(np.dot(X.T, (X[None, :, :] * mask[:, :, None])), 0, 1)b = np.dot(X.T, (mask*Y).T)beta2 = np.linalg.solve(a, b.T)i = 123print "real:", beta[i]print "by solve:", beta2[i]m = mask[i]x2 = X[m]y2 = Y[i, m]print "by lstsq:", np.linalg.lstsq(x2, y2)[0]

output 123th coefficient:

real: [ 0.35813131  0.29736779]by solve: [ 0.38088499  0.30382547]by lstsq: [ 0.38088499  0.30382547]

You can also calculate the a array by following code, I think it will use less memory than the method above:

a2 = np.empty((M, 2, 2))xm = mask * xa2[:, 0, 0] = (xm*xm).sum(1)a2[:, 1, 0] = (xm*mask).sum(1)a2[:, 0, 1] = a2[:, 1, 0]a2[:, 1, 1] = (mask).sum(1)print np.allclose(a2, a)