PCA For categorical features? PCA For categorical features? python python

PCA For categorical features?


I disagree with the others.

While you can use PCA on binary data (e.g. one-hot encoded data) that does not mean it is a good thing, or it will work very well.

PCA is designed for continuous variables. It tries to minimize variance (=squared deviations). The concept of squared deviations breaks down when you have binary variables.

So yes, you can use PCA. And yes, you get an output. It even is a least-squared output: it's not as if PCA would segfault on such data. It works, but it is just much less meaningful than you'd want it to be; and supposedly less meaningful than e.g. frequent pattern mining.


MCA is a known technique for categorical data dimension reduction. In R there is a lot of package to use MCA and even mix with PCA in mixed contexts. In python exist a a mca library too. MCA apply similar maths that PCA, indeed the French statistician used to say, "data analysis is to find correct matrix to diagonalize"

http://gastonsanchez.com/visually-enforced/how-to/2012/10/13/MCA-in-R/


Basically, PCA finds and eliminate less informative (duplicate) information on feature set and reduce the dimension of feature space. In other words, imagine a N-dimensional hyperspace, PCA finds such M (M < N) features that the data variates most. In this way data may be represented as M-dimensional feature vectors. Mathematically, it is some-kind of a eigen-values & eigen vectors calculation of a feature space.

So, it is not important whether the features are continuous or not.

PCA is used widely on many application. Mostly for eliminating noisy, less informative data that comes from some sensor or hardware before classification/recognition.

Edit:

Statistically speaking, categorical features can be seen as discrete random variables in interval [0,1]. Computation for expectation E{X} and variance E{(X-E{X})^2) are still valid and meaningful for discrete rvs. I still stand for the applicability of PCA in case of categorical features.

Consider a case where you would like to predict whether "It is going to rain for a given day or not". You have categorical feature X which is "Do I have to go to work for the given day", 1 for yes and 0 for no. Clearly weather conditions do not depend on our work schedule, so P(R|X)=P(R). Assuming 5 days of work for every week, we have more 1s than 0s for X in our randomly collected dataset. PCA would probably lead to dropping this low-variance dimension in your feature representation.

At the end of the day, PCA is for dimension reduction with minimal loss of information. Intuitively, we rely on variance of the data on a given axis to measure its usefulness for the task. I don't think there is any theoretical limitation for applying it to categorical features. Practical value depends on application and data which is also the case for continuous variables.