Python Pandas to R dataframe
If standard text-based formats (csv) are too slow or bulky, I'd recommend feather, a serialization format built on Apache Arrow. It was explicitly developed by the creators of RStudio/ggplot2/etc (Hadley Wickham) and pandas (Wes McKinney) for performance and interoperability between Python and R (see here).
You need pandas verson 0.20.0+, pip install feather-format
, then you can use the to_feather
/read_feather
operations as drop-in replacements for to_csv
/read_csv
:
df_R.to_feather('filename.feather')df_R = pd.read_feather('filename.feather')
The R
equivalents (using the package feather
) are
df <- feather::read_feather('filename.feather')feather::write_feather(df, 'filename.feather')
Besides some minor tweaks (e.g. you can't save custom DataFrame indexes in feather, so you'll need to call df.reset_index()
first), this is a fast and easy drop-in replacement for csv
, pickle
, etc.
The recent documentation https://rpy2.github.io/doc/v3.2.x/html/generated_rst/pandas.html has a section about interacting with pandas
.
Otherwise objects of type rpy2.robjects.vectors.DataFrame
have a method to_csvfile
, not to_csv
:
https://rpy2.github.io/doc/v3.2.x/html/vector.html#rpy2.robjects.vectors.DataFrame.to_csvfile
If wanting to pass data between Python and R, there are more efficient ways than writing and reading CSV files. Try the conversion system:
from rpy2.robjects import pandas2ripandas2ri.activate()from rpy2.robjects.packages import importrbase = importr('base')# call an R function on a Pandas DataFramebase.summary(my_pandas_dataframe)