In sklearn.decomposition.PCA, why are components_ negative? In sklearn.decomposition.PCA, why are components_ negative? numpy numpy

In sklearn.decomposition.PCA, why are components_ negative?


As you figured out in your answer, the results of a singular value decomposition (SVD) are not unique in terms of singular vectors. Indeed, if the SVD of X is \sum_1^r \s_i u_i v_i^\top :enter image description here

with the s_i ordered in decreasing fashion, then you can see that you can change the sign (i.e., "flip") of say u_1 and v_1, the minus signs will cancel so the formula will still hold.

This shows that the SVD is unique up to a change in sign in pairs of left and right singular vectors.

Since the PCA is just a SVD of X (or an eigenvalue decomposition of X^\top X), there is no guarantee that it does not return different results on the same X every time it is performed. Understandably, scikit learn implementation wants to avoid this: they guarantee that the left and right singular vectors returned (stored in U and V) are always the same, by imposing (which is arbitrary) that the largest coefficient of u_i in absolute value is positive.

As you can see reading the source: first they compute U and V with linalg.svd(). Then, for each vector u_i (i.e, row of U), if its largest element in absolute value is positive, they don't do anything. Otherwise, they change u_i to - u_i and the corresponding left singular vector, v_i, to - v_i. As told earlier, this does not change the SVD formula since the minus sign cancel out. However, now it is guaranteed that the U and V returned after this processing are always the same, since the indetermination on the sign has been removed.


After some digging, I've cleared up some, but not all, of my confusion on this. This issue has been covered on stats.stackexchange here. The mathematical answer is that "PCA is a simple mathematical transformation. If you change the signs of the component(s), you do not change the variance that is contained in the first component." However, in this case (with sklearn.PCA), the source of ambiguity is much more specific: in the source (line 391) for PCA you have:

U, S, V = linalg.svd(X, full_matrices=False)# flip eigenvectors' sign to enforce deterministic outputU, V = svd_flip(U, V)components_ = V

svd_flip, in turn, is defined here. But why the signs are being flipped to "ensure a deterministic output," I'm not sure. (U, S, V have already been found at this point...). So while sklearn's implementation is not incorrect, I don't think it's all that intuitive. Anyone in finance who is familiar with the concept of a beta (coefficient) will know that the first principal component is most likely something similar to a broad market index. Problem is, the sklearn implementation will get you strong negative loadings to that first principal component.

My solution is a dumbed-down version that does not implement svd_flip. It's pretty barebones in that it doesn't have sklearn parameters such as svd_solver, but does have a number of methods specifically geared towards this purpose.


With the PCA here in 3 dimensions, you basically find iteratively: 1) The 1D projection axis with the maximum variance preserved 2) The maximum variance preserving axis perpendicular to the one in 1). The third axis is automatically the one which is perpendicular to first two.

The components_ are listed according to the explained variance. So the first one explains the most variance, and so on. Note that by the definition of the PCA operation, while you are trying to find the vector for projection in the first step, which maximizes the variance preserved, the sign of the vector does not matter: Let M be your data matrix (in your case with the shape of (20,3)). Let v1 be the vector for preserving the maximum variance, when the data is projected on. When you select -v1 instead of v1, you obtain the same variance. (You can check this out). Then when selecting the second vector, let v2 be the one which is perpendicular to v1 and preserves the maximum variance. Again, selecting -v2 instead of v2 will preserve the same amount of variance. v3 then can be selected either as -v3 or v3. Here, the only thing which matters is that v1,v2,v3 constitute an orthonormal basis, for the data M. The signs mostly depend on how the algorithm solves the eigenvector problem underlying the PCA operation. Eigenvalue decomposition or SVD solutions may differ in signs.