ANOVA in python using pandas dataframe with statsmodels or scipy? ANOVA in python using pandas dataframe with statsmodels or scipy? pandas pandas

ANOVA in python using pandas dataframe with statsmodels or scipy?


I set up a direct comparison to test them, found that their assumptions can differ slightly , got a hint from a statistician, and here is an example of ANOVA on a pandas dataframe matching R's results:

import pandas as pdimport statsmodels.api as smfrom statsmodels.formula.api import ols# R code on R sample dataset#> anova(with(ChickWeight, lm(weight ~ Time + Diet)))#Analysis of Variance Table##Response: weight#           Df  Sum Sq Mean Sq  F value    Pr(>F)#Time        1 2042344 2042344 1576.460 < 2.2e-16 ***#Diet        3  129876   43292   33.417 < 2.2e-16 ***#Residuals 573  742336    1296#write.csv(file='ChickWeight.csv', x=ChickWeight, row.names=F)cw = pd.read_csv('ChickWeight.csv')cw_lm=ols('weight ~ Time + C(Diet)', data=cw).fit() #Specify C for Categoricalprint(sm.stats.anova_lm(cw_lm, typ=2))#                  sum_sq   df            F         PR(>F)#C(Diet)    129876.056995    3    33.416570   6.473189e-20#Time      2016357.148493    1  1556.400956  1.803038e-165#Residual   742336.119560  573          NaN            NaN