Writing large Pandas Dataframes to CSV file in chunks Writing large Pandas Dataframes to CSV file in chunks python python

Writing large Pandas Dataframes to CSV file in chunks


Solution:

header = Truefor chunk in chunks:    chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename),        header=header, cols=[['TIME','STUFF']], mode='a')    header = False

Notes:

  • The mode='a' tells pandas to append.
  • We only write a column header on the first chunk.


Check out the chunksize argument in the to_csv method. Here are the docs.

Writing to file would look like:

df.to_csv("path/to/save/file.csv", chunksize=1000, cols=['TIME','STUFF'])


Why don't you only read the columns of interest and then save it?

file_in = os.path.join(folder, filename)file_out = os.path.join(folder, new_folder, 'new_file' + filename)df = pd.read_csv(file_in, sep='\t', skiprows=(0, 1, 2), header=0, names=['TIME', 'STUFF'])df.to_csv(file_out)