Principal components analysis using pandas dataframe

python pandas pca scientific-computing principal-components

Most sklearn objects work with pandas dataframes just fine, would something like this work for you?

import pandas as pdimport numpy as npfrom sklearn.decomposition import PCAdf = pd.DataFrame(data=np.random.normal(0, 1, (20, 10)))pca = PCA(n_components=5)pca.fit(df)

You can access the components themselves with

pca.components_

python pandas pca scientific-computing principal-components

import pandasfrom sklearn.decomposition import PCAimport numpyimport matplotlib.pyplot as plotdf = pandas.DataFrame(data=numpy.random.normal(0, 1, (20, 10)))# You must normalize the data before applying the fit methoddf_normalized=(df - df.mean()) / df.std()pca = PCA(n_components=df.shape[1])pca.fit(df_normalized)# Reformat and view resultsloadings = pandas.DataFrame(pca.components_.T,columns=['PC%s' % _ for _ in range(len(df_normalized.columns))],index=df.columns)print(loadings)plot.plot(pca.explained_variance_ratio_)plot.ylabel('Explained Variance')plot.xlabel('Components')plot.show()

CodeHunter

Principal components analysis using pandas dataframe

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last