Writing large Pandas Dataframes to CSV file in chunks
Solution:
header = Truefor chunk in chunks: chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename), header=header, cols=[['TIME','STUFF']], mode='a') header = False
Notes:
- The
mode='a'
tells pandas to append. - We only write a column header on the first chunk.
Check out the chunksize
argument in the to_csv
method. Here are the docs.
Writing to file would look like:
df.to_csv("path/to/save/file.csv", chunksize=1000, cols=['TIME','STUFF'])
Why don't you only read the columns of interest and then save it?
file_in = os.path.join(folder, filename)file_out = os.path.join(folder, new_folder, 'new_file' + filename)df = pd.read_csv(file_in, sep='\t', skiprows=(0, 1, 2), header=0, names=['TIME', 'STUFF'])df.to_csv(file_out)