T-test in Pandas
it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as:
from scipy.stats import ttest_indcat1 = my_data[my_data['Category']=='cat1']cat2 = my_data[my_data['Category']=='cat2']ttest_ind(cat1['values'], cat2['values'])>>> (1.4927289925706944, 0.16970867501294376)
it returns a tuple with the t-statistic & the p-value
see here for other t-tests http://docs.scipy.org/doc/scipy/reference/stats.html
EDIT: I had not realized this was about the data format. You could use
import pandas as pdimport scipytwo_data = pd.DataFrame(data, index=data['Category'])
Then accessing the categories is as simple as
scipy.stats.ttest_ind(two_data.loc['cat'], two_data.loc['cat2'], equal_var=False)
The loc operator
accesses rows by label.
one sided or two sided dependent or independent
If you have two independent samples but you do not know that they have equal variance, you can use Welch's t-test. It is as simple as
scipy.stats.ttest_ind(cat1['values'], cat2['values'], equal_var=False)
For reasons to prefer Welch's test, see https://stats.stackexchange.com/questions/305/when-conducting-a-t-test-why-would-one-prefer-to-assume-or-test-for-equal-vari.
For two dependent samples, you can use
scipy.stats.ttest_rel(cat1['values'], cat2['values'])
I simplify the code a little bit.
from scipy.stats import ttest_indttest_ind(*my_data.groupby('Category')['value'].apply(lambda x:list(x)))