Read a zipped file as a pandas DataFrame

python zip pandas

If you want to read a zipped or a tar.gz file into pandas dataframe, the read_csv methods includes this particular implementation.

df = pd.read_csv('filename.zip')

Or the long form:

df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')

Description of the compression argument from the docs:

compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.

python zip pandas

I think you want to open the ZipFile, which returns a file-like object, rather than read:

In [11]: crime2013 = pd.read_csv(z.open('crime_incidents_2013_CSV.csv'))In [12]: crime2013Out[12]:<class 'pandas.core.frame.DataFrame'>Int64Index: 24567 entries, 0 to 24566Data columns (total 15 columns):CCN                            24567  non-null valuesREPORTDATETIME                 24567  non-null valuesSHIFT                          24567  non-null valuesOFFENSE                        24567  non-null valuesMETHOD                         24567  non-null valuesLASTMODIFIEDDATE               24567  non-null valuesBLOCKSITEADDRESS               24567  non-null valuesBLOCKXCOORD                    24567  non-null valuesBLOCKYCOORD                    24567  non-null valuesWARD                           24563  non-null valuesANC                            24567  non-null valuesDISTRICT                       24567  non-null valuesPSA                            24567  non-null valuesNEIGHBORHOODCLUSTER            24263  non-null valuesBUSINESSIMPROVEMENTDISTRICT    3613  non-null valuesdtypes: float64(4), int64(1), object(10)

python zip pandas

It seems you don't even have to specify the compression any more. The following snippet loads the data from filename.zip into df.

import pandas as pddf = pd.read_csv('filename.zip')

(Of course you will need to specify separator, header, etc. if they are different from the defaults.)

CodeHunter

Read a zipped file as a pandas DataFrame

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last