Efficient 1D linear regression for each element of 3D numpy array
If i understand it correctly, you want do many linear regression y = k * x + b
, there is only one x
, but many y
, for every y
you want to calculate the k
and b
.
If the shape of x
is 50, y is (50, 1000), you can use numpy.linalg.lstsq
, here is some demo:
import numpy as npx = np.random.rand(50)k = np.random.rand(1000)b = np.random.rand(1000)y = np.outer(x, k) + b + np.random.normal(size=(50, 1000), scale=1e-10)r = np.linalg.lstsq(np.c_[x, np.ones_like(x)], y)[0]print np.allclose(r[0], k)print np.allclose(r[1], b)
for the y with shape (50, 2000, 2000), you can reshape it to (50, 2000*2000).
Edit
Here the my solution for masked array. The formula is:
Prepare Y as 2-dim array with shape (1889*1566, 57), X as 2-dim array with shape (57, 2). mask as a bool array with the same shape as Y, True means the value in Y is available.
Calculate array a
with shape (1889*1566, 2, 2), b
with shape (1889*1566, 2), then call numpy.linalg.solve(a, b)
, here is some demo code:
import numpy as npN = 50M = 1000x = np.random.rand(N)X = np.c_[x, np.ones_like(x)]beta = np.random.rand(M, 2)Y = np.dot(beta, X.T)Y += np.random.normal(scale=0.1, size=Y.shape)mask = np.random.randint(0, 2, size=Y.shape).astype(np.bool)a = np.swapaxes(np.dot(X.T, (X[None, :, :] * mask[:, :, None])), 0, 1)b = np.dot(X.T, (mask*Y).T)beta2 = np.linalg.solve(a, b.T)i = 123print "real:", beta[i]print "by solve:", beta2[i]m = mask[i]x2 = X[m]y2 = Y[i, m]print "by lstsq:", np.linalg.lstsq(x2, y2)[0]
output 123th coefficient:
real: [ 0.35813131 0.29736779]by solve: [ 0.38088499 0.30382547]by lstsq: [ 0.38088499 0.30382547]
You can also calculate the a
array by following code, I think it will use less memory than the method above:
a2 = np.empty((M, 2, 2))xm = mask * xa2[:, 0, 0] = (xm*xm).sum(1)a2[:, 1, 0] = (xm*mask).sum(1)a2[:, 0, 1] = a2[:, 1, 0]a2[:, 1, 1] = (mask).sum(1)print np.allclose(a2, a)