Correlation among multiple categorical variables (Pandas)

python pandas statistics heatmap categorical-data

You can using pd.factorize

df.apply(lambda x : pd.factorize(x)[0]).corr(method='pearson', min_periods=1)Out[32]:      a    c    da  1.0  1.0  1.0c  1.0  1.0  1.0d  1.0  1.0  1.0

Data input

df=pd.DataFrame({'a':['a','b','c'],'c':['a','b','c'],'d':['a','b','c']})

Update

from scipy.stats import chisquaredf=df.apply(lambda x : pd.factorize(x)[0])+1pd.DataFrame([chisquare(df[x].values,f_exp=df.values.T,axis=1)[0] for x in df])Out[123]:      0    1    2    30  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  0.0  0.0  0.0  0.0df=pd.DataFrame({'a':['a','d','c'],'c':['a','b','c'],'d':['a','b','c'],'e':['a','b','c']})

python pandas statistics heatmap categorical-data

Turns out, the only solution I found is to iterate trough all the factor*factor pairs.

factors_paired = [(i,j) for i in df.columns.values for j in df.columns.values] chi2, p_values =[], []for f in factors_paired:    if f[0] != f[1]:        chitest = chi2_contingency(pd.crosstab(df[f[0]], df[f[1]]))           chi2.append(chitest[0])        p_values.append(chitest[1])    else:      # for same factor pair        chi2.append(0)        p_values.append(0)chi2 = np.array(chi2).reshape((23,23)) # shape it as a matrixchi2 = pd.DataFrame(chi2, index=df.columns.values, columns=df.columns.values) # then a df for convenience

CodeHunter

Correlation among multiple categorical variables (Pandas)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last