How can I read tar.gz file using pandas read_csv with gzip compression option?
You can use the tarfile
module to read a particular file from the tar.gz archive (as discussed in this resolved issue).If there is only one file in the archive, then you can do this:
import tarfileimport pandas as pdwith tarfile.open("sample.tar.gz", "r:*") as tar: csv_path = tar.getnames()[0] df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=" ")
The read mode r:*
handles the gz extension (or other kinds of compression) appropriately. If there are multiple files in the zipped tar file, then you could do something like csv_path = list(n for n in tar.getnames() if n.endswith('.csv'))[-1]
line to get the last csv file in the archived folder.