Minimal example of rpy2 regression using pandas data frame Minimal example of rpy2 regression using pandas data frame r r

Minimal example of rpy2 regression using pandas data frame


After calling pandas2ri.activate() some conversions from Pandas objects to R objects happen automatically. For example, you can use

M = R.lm('y~x', data=df)

instead of

robjects.globalenv['dataframe'] = dataframeM = stats.lm('y~x', data=base.as_symbol('dataframe'))

import pandas as pdfrom rpy2 import robjects as rofrom rpy2.robjects import pandas2ripandas2ri.activate()R = ro.rdf = pd.DataFrame({'x': [1,2,3,4,5],                    'y': [2,1,3,5,4]})M = R.lm('y~x', data=df)print(R.summary(M).rx2('coefficients'))

yields

            Estimate Std. Error  t value  Pr(>|t|)(Intercept)      0.6  1.1489125 0.522233 0.6376181x                0.8  0.3464102 2.309401 0.1040880


The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

Otherwise, the conversion shipping with rpy2 appears to be working here:

from rpy2.robjects import pandas2ripandas2ri.activate()robjects.globalenv['dataframe'] = dataframeM = stats.lm('y~x', data=base.as_symbol('dataframe'))

The result:

>>> print(base.summary(M).rx2('coefficients'))            Estimate Std. Error  t value  Pr(>|t|)(Intercept)      0.6  1.1489125 0.522233 0.6376181x                0.8  0.3464102 2.309401 0.1040880


I can add to unutbu's answer by outlining how to retrieve particular elements of the coefficients table including, crucially, the p-values.

def r_matrix_to_data_frame(r_matrix):    """Convert an R matrix into a Pandas DataFrame"""    import pandas as pd    from rpy2.robjects import pandas2ri    array = pandas2ri.ri2py(r_matrix)    return pd.DataFrame(array,                        index=r_matrix.names[0],                        columns=r_matrix.names[1])# Let's start from unutbu's line retrieving the coefficients:coeffs = R.summary(M).rx2('coefficients')df = r_matrix_to_data_frame(coeffs)

This leaves us with a DataFrame which we can access in the normal way:

In [179]: df['Pr(>|t|)']Out[179]:(Intercept)    0.637618x              0.104088Name: Pr(>|t|), dtype: float64In [181]: df.loc['x', 'Pr(>|t|)']Out[181]: 0.10408803866182779