Why my PCA is not invariant to rotation and axis swap?
Firstly, your pca
function is not correct, it should be
def pca(x): x -= np.mean(x,axis=0) cov = np.cov(x.T) e_values, e_vectors = np.linalg.eig(cov) order = np.argsort(e_values)[::-1] v = e_vectors[:,order] return x @ v
You shouldn't transpose the e_vectors[:,order]
because we want each column of the v
array is an eigenvector, therefore, x @ v
will be projections of x
on those eigenvectors.
Secondly, I think you misunderstand the meaning of rotation. It is not voxel1
that should be rotated, but the basis1
. If you rotate (by taking transposition) voxel1
, what you really do is to rearrange the indices of grid points, while the coordinates of the points basis1
are not changed.
In order to rotate the points (around the z axis for example), you can first define a function to calculate the rotation matrix given an angle
rotate_mat = lambda theta: np.array([[np.cos(theta),-np.sin(theta),0.],[np.sin(theta),np.cos(theta),0.],[0.,0.,1.]])
with the rotation matrix generated by this function, you can rotate the array basis1
to create another array basis2
basis2 = basis1 @ rotate_mat(np.deg2rad(angle))
Now it comes to the title of your question "Why my PCA is not invariant to rotation and axis swap?", from this post, the PCA result is not unique, you can actually run a test to see this
import numpy as npnp.random.seed(10)rotate_mat = lambda theta: np.array([[np.cos(theta),-np.sin(theta),0.],[np.sin(theta),np.cos(theta),0.],[0.,0.,1.]])def pca(x): x -= np.mean(x,axis=0) cov = np.cov(x.T) e_values, e_vectors = np.linalg.eig(cov) order = np.argsort(e_values)[::-1] v = e_vectors[:,order] return x @ vangles = np.linspace(0,90,91) res = []for angle in angles: voxel1 = np.random.normal(size=(3,3,3)) voxel2 = voxel1.copy() voxel1 = voxel1.reshape(27,1) voxel2 = voxel2.reshape(27,1) basis1 = np.array([[i+1,j+1,k+1] for k in range(3) for j in range(3) for i in range(3)]) # basis2 = np.hstack((-basis1[:,1][:,None],basis1[:,0][:,None],-basis1[:,2][:,None])) basis2 = basis1 @ rotate_mat(np.deg2rad(angle)) voxel1 = voxel1*basis1 voxel2 = voxel2*basis2 print(angle,np.all(np.abs(pca(voxel1) - pca(voxel2) < 1e-6))) res.append(np.all(np.abs(pca(voxel1) - pca(voxel2) < 1e-6))) print()print(np.sum(res) / len(angles))
After you run this script, you will see that in only 21% of times the two PCA results are the same.
UPDATE
I think instead of focusing on the eigenvectors of the principal components, you can instead focus on the projections. For two clouds of points, even though they are essentially the same, the eigenvectors can be drastically different. Therefore, hardcoding in order to somehow let the two sets of eigenvectors to be the same is a very difficult task.
However, based on this post, for the same cloud of points, two sets of eigenvectors can be different only up to a minus sign. Therefore, the projections upon the two sets of eigenvectors are also different only up to a minus sign. This actually offers us an elegant solution, for the projections along an eigenvector (principal axis), all we need to do is to switch the sign of the projections so that the projection with the largest absolute value along that principal axis is positive.
import numpy as npfrom numpy.random import randn#np.random.seed(20)rotmat_z = lambda theta: np.array([[np.cos(theta),-np.sin(theta),0.],[np.sin(theta),np.cos(theta),0.],[0.,0.,1.]])rotmat_y = lambda theta: np.array([[np.cos(theta),0.,np.sin(theta)],[0.,1.,0.],[-np.sin(theta),0.,np.cos(theta)]])rotmat_x = lambda theta: np.array([[1.,0.,0.],[0.,np.cos(theta),-np.sin(theta)],[0.,np.sin(theta),np.cos(theta)]])# based on https://en.wikipedia.org/wiki/Rotation_matrixrot_mat = lambda alpha,beta,gamma: rotmat_z(alpha) @ rotmat_y(beta) @ rotmat_x(gamma)deg2rad = lambda alpha,beta,gamma: [np.deg2rad(alpha),np.deg2rad(beta),np.deg2rad(gamma)]def pca(feat, x): # pca with attemt to create convention on sign changes x_c = x - np.mean(x,axis=0) x_f = feat * x x_f -= np.mean(x_f,axis=0) cov = np.cov(x_f.T) e_values, e_vectors = np.linalg.eig(cov) order = np.argsort(e_values)[::-1] v = e_vectors[:,order] # here is the solution, we switch the sign of the projections # so that the projection with the largest absolute value along a principal axis is positive proj = x_f @ v asign = np.sign(proj) max_ind = np.argmax(np.abs(proj),axis=0)[None,:] sign = np.take_along_axis(asign,max_ind,axis=0) proj = proj * sign return projref_angles = np.linspace(0.0,90.0,10)angles = [[alpha,beta,gamma] for alpha in ref_angles for beta in ref_angles for gamma in ref_angles]voxel1 = np.random.normal(size=(3,3,3))res = []for angle in angles: voxel2 = voxel1.copy() voxel1 = voxel1.reshape(27,1) voxel2 = voxel2.reshape(27,1) basis1 = np.array([[i+1,j+1,k+1] for k in range(3) for j in range(3) for i in range(3)]).astype(np.double) basis1 = basis1 + 1e-4 * randn(27,3) # perturbation basis2 = basis1 @ rot_mat(*deg2rad(*angle)) original_diff = np.sum(np.abs(basis1-basis2)) gg= np.abs(pca(voxel1, basis1) - pca(voxel1, basis2)) ss= np.sum(np.ravel(gg)) bl= np.all(gg<1e-4) print('difference before pca %.3f,' % original_diff, 'difference after pca %.3f' % ss,bl) res.append(bl) del basis1, basis2print('correct for %.1f percent of time' % (100*(np.sum(res) / len(res))))
As you can see by running this script, the projections on the principal axis are the same, this means we have resolved the issue of PCA results being not unique.
Reply to EDIT 3
As for the new issue you raised, I think you missed an important point, it is the projections of the cloud of points onto the principal axes that are invariant, not anything else. Therefore, if you rotate voxel1
and obtain voxel2
, they are the same in the sense that their own respective projections onto the principal axes of the cloud of points are the same, it actually does not make too much sense to compare pca(voxel1,basis1)
with pca(voxel2,basis1)
.
Furthermore, the method rotate
of scipy.ndimage
actually changes information, as you can see by running this script
image1 = np.linspace(1,100,100).reshape(10,10)image2 = scipy.ndimage.rotate(image1, 45, mode='nearest', axes=(0, 1), reshape=False)image3 = scipy.ndimage.rotate(image2, -45, mode='nearest', axes=(0, 1), reshape=False)fig,ax = plt.subplots(nrows=1,ncols=3,figsize=(12,4))ax[0].imshow(image1)ax[1].imshow(image2)ax[2].imshow(image3)
The output image is
As you can see the matrix after rotation is not the same as the original one, some information of the original matrix is changed.
Reply to EDIT 4
Actually, we are almost there, the two pca results are different because we are comparing pca components for different points.
indices = np.arange(27)indices3d = indices.reshape((3,3,3))# apply rotations to the indices, it is not weights yetgen = rotations24(indices3d)# construct the weightsvoxel10 = np.random.normal(size=(3,3,3))res = []count = 0for ind2 in gen: count += 1 # ind2 is the array of indices after rotation # reindex the weights with the indices after rotation voxel1 = voxel10.copy().reshape(27,1) voxel2 = voxel1[ind2.reshape(27,)] # basis1 is the array of coordinates where the points are basis1 = np.array([[i+1,j+1,k+1] for k in range(3) for j in range(3) for i in range(3)]).astype(np.double) basis1 += 1e-4*np.random.normal(size=(27, 1)) # reindex the coordinates with the indices after rotation basis2 = basis1[ind2.reshape(27,)] # add a slight modification to pca, return the axes also pca1,v1 = pca(voxel1,basis1) pca2,v2 = pca(voxel2,basis2) # sort the principal components before comparing them pca1 = np.sort(pca1,axis=0) pca2 = np.sort(pca2,axis=0) gg= np.abs(pca1 - pca2) ss= np.sum(np.ravel(gg)) bl= np.all(gg<1e-4) print('difference after pca %.3f' % ss,bl) res.append(bl) del basis1, basis2print('correct for %.1f percent of time' % (100*(np.sum(res) / len(res))))
Running this script, you will find, for each rotation, the two sets of principal axes are different only up to a minus sign. The two sets of pca results are different because the indices of the cloud of points before and after rotation are different (since you apply rotation to the indices). If you sort the pca results before comparing them, you will find the two pca results are exactly the same.
Summary
The answer to this question can be divided into two parts. In the first part, the rotation is applied to the basis (the coordinates of points), while the indices and the corresponding weights are unchanged. In the second part, the rotation is applied to the indices, then the weights and the basis are rearranged with the new indices. For both of the two parts, the solution pca
function is the same
def pca(feat, x): # pca with attemt to create convention on sign changes x_c = x - np.mean(x,axis=0) x_f = feat * x x_f -= np.mean(x_f,axis=0) cov = np.cov(x_f.T) e_values, e_vectors = np.linalg.eig(cov) order = np.argsort(e_values)[::-1] v = e_vectors[:,order] # here is the solution, we switch the sign of the projections # so that the projection with the largest absolute value along a principal axis is positive proj = x_f @ v asign = np.sign(proj) max_ind = np.argmax(np.abs(proj),axis=0)[None,:] sign = np.take_along_axis(asign,max_ind,axis=0) proj = proj * sign return proj
The idea of this function is, instead of matching the principal axes, we can match the principal components since it is the principal components that are rotationally invariant after all.
Based on this function pca
, the first part of this answer is easy to understand, since the indices of the points are unchanged while we only rotate the basis. In order to understand the second part of this answer (Reply to EDIT 5), we must first understand the function rotations24
. This function rotates the indices rather than the coordinates of the points, therefore, if we stay at the same position observing the points, we will feel that the positions of the points are changed.
With this in mind, it is not hard to understand Reply to EDIT 5.
Actually, the function pca
in this answer can be applied to more general cases, for example (we rotate the indices)
num_of_points_per_dim = 10num_of_points = num_of_points_per_dim ** 3indices = np.arange(num_of_points)indices3d = indices.reshape((num_of_points_per_dim,num_of_points_per_dim,num_of_points_per_dim))voxel10 = 100*np.random.normal(size=(num_of_points_per_dim,num_of_points_per_dim,num_of_points_per_dim))gen = rotations24(indices3d)res = []for ind2 in gen: voxel1 = voxel10.copy().reshape(num_of_points,1) voxel2 = voxel1[ind2.reshape(num_of_points,)] basis1 = 100*np.random.rand(num_of_points,3) basis2 = basis1[ind2.reshape(num_of_points,)] pc1 = np.sort(pca(voxel1, basis1),axis=0) pc2 = np.sort(pca(voxel2, basis2),axis=0) gg= np.abs(pc1-pc2) ss= np.sum(np.ravel(gg)) bl= np.all(gg<1e-4) print('difference after pca %.3f' % ss,bl) res.append(bl) del basis1, basis2print('correct for %.1f percent of time' % (100*(np.sum(res) / len(res))))