Reading a pickle file (PANDAS Python Data Frame) in R
Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.
install.packages('reticulate')
After which I created a Python script like this from examples given in their documentation.
Python file:
import pandas as pddef read_pickle_file(file): pickle_data = pd.read_pickle(file) return pickle_data
And then my R file looked like:
require("reticulate")source_python("pickle_reader.py")pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")
This gave me all my data in R stored earlier in pickle format.
You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... e.g.
library(reticulate)pd <- import("pandas")pickle_data <- pd$read_pickle("dataset.pickle")
Edit: If you can install and use the {reticulate} package, then this answer is probably outdated. See the other answers below for an easier path.
You could load the pickle in python and then export it to R via the python package rpy2
(or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython
and rPython
for ways in which you could trigger the python commands from R.
Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread
in the R package data.table
. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE)
and read.table
.
As usual, there are /many/ ways to skin this particular cat. The basic steps are:
- Load the data in python
- Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
- Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or
fread
) - (optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using
fread
then you're already done).