Apply GZIP compression to a CSV in Python Pandas Apply GZIP compression to a CSV in Python Pandas python python

Apply GZIP compression to a CSV in Python Pandas


Using df.to_csv() with the keyword argument compression='gzip' should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.

You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__.


It is done very easily with pandas

import pandas as pd

Write a pandas dataframe to disc as gunzip compressed csv

df.to_csv('dfsavename.csv.gz', compression='gzip')

Read from disc

df = pd.read_csv('dfsavename.csv.gz', compression='gzip')


From documentation

import gzipcontent = "Lots of content here"with gzip.open('file.txt.gz', 'wb') as f:    f.write(content)

with pandas

import gzipcontent = df.to_csv(      sep='|',      header=True,      index=False,      quoting=csv.QUOTE_ALL,      quotechar='"',      doublequote=True,      line_terminator='\n')with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:    f.write(content)

The trick here being that to_csv outputs text if you don't pass it a filename. Then you just redirect that text to gzip's write method.