Read a zipped file as a pandas DataFrame
If you want to read a zipped or a tar.gz file into pandas dataframe, the read_csv
methods includes this particular implementation.
df = pd.read_csv('filename.zip')
Or the long form:
df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')
Description of the compression argument from the docs:
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.
I think you want to open
the ZipFile, which returns a file-like object, rather than read
:
In [11]: crime2013 = pd.read_csv(z.open('crime_incidents_2013_CSV.csv'))In [12]: crime2013Out[12]:<class 'pandas.core.frame.DataFrame'>Int64Index: 24567 entries, 0 to 24566Data columns (total 15 columns):CCN 24567 non-null valuesREPORTDATETIME 24567 non-null valuesSHIFT 24567 non-null valuesOFFENSE 24567 non-null valuesMETHOD 24567 non-null valuesLASTMODIFIEDDATE 24567 non-null valuesBLOCKSITEADDRESS 24567 non-null valuesBLOCKXCOORD 24567 non-null valuesBLOCKYCOORD 24567 non-null valuesWARD 24563 non-null valuesANC 24567 non-null valuesDISTRICT 24567 non-null valuesPSA 24567 non-null valuesNEIGHBORHOODCLUSTER 24263 non-null valuesBUSINESSIMPROVEMENTDISTRICT 3613 non-null valuesdtypes: float64(4), int64(1), object(10)