Is there a Python module to open SPSS files? Is there a Python module to open SPSS files? python python

Is there a Python module to open SPSS files?


I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

For example, in order to read a SPSS sav file you would do:

import pyreadstatdf, meta = pyreadstat.read_sav("/path/to/sav/file.sav")

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels.read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

You can find it here: https://github.com/Roche/pyreadstat


Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.

Otherwise, Pandas includes a convenient wrapper for rpy2. Here is an example of use with Peat and Barton's weights.sav data set:

>>> import pandas.rpy.common as com>>> filename = "weights.sav">>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)>>> w = com.convert_robj(w)>>> w.head()     ID  WEIGHT  LENGTH  HEADC  GENDER  EDUCATIO              PARITY1  L001    3.95    55.5   37.5  Female  tertiary  3 or more siblings2  L003    4.63    57.0   38.5  Female  tertiary           Singleton3  L004    4.75    56.0   38.5    Male    year12          2 siblings4  L005    3.92    56.0   39.0    Male  tertiary         One sibling5  L006    4.56    55.0   39.5    Male    year10          2 siblings


As a note for people findings this later (like me): pandas.rpyhas been deprecated in the newest versions of pandas (>0.16) as noted here. That page includes information on updating code to use the rpy2 interface.