Concatenating multiple csv files into a single csv with the same header - Python

python csv pandas terminal concatenation

If you don't need the CSV in memory, just copying from input to output, it'll be a lot cheaper to avoid parsing at all, and copy without building up in memory:

import shutilimport glob#import csv files from folderpath = r'data/US/market/merged_data'allFiles = glob.glob(path + "/*.csv")allFiles.sort()  # glob lacks reliable ordering, so impose your own if output order matterswith open('someoutputfile.csv', 'wb') as outfile:    for i, fname in enumerate(allFiles):        with open(fname, 'rb') as infile:            if i != 0:                infile.readline()  # Throw away header on all but first file            # Block copy rest of file from input to output without parsing            shutil.copyfileobj(infile, outfile)            print(fname + " has been imported.")

That's it; shutil.copyfileobj handles efficiently copying the data, dramatically reducing the Python level work to parse and reserialize.

This assumes all the CSV files have the same format, encoding, line endings, etc., and the header doesn't contain embedded newlines, but if that's the case, it's a lot faster than the alternatives.

python csv pandas terminal concatenation

Are you required to do this in Python? If you are open to doing this entirely in shell, all you'd need to do is first cat the header row from a randomly selected input .csv file into merged.csv before running your one-liner:

cat a-randomly-selected-csv-file.csv | head -n1 > merged.csvfor f in *.csv; do cat "`pwd`/$f" | tail -n +2 >> merged.csv; done

python csv pandas terminal concatenation

You don't need pandas for this, just the simple csv module would work fine.

import csvdf_out_filename = 'df_out.csv'write_headers = Truewith open(df_out_filename, 'wb') as fout:    writer = csv.writer(fout)    for filename in allFiles:        with open(filename) as fin:            reader = csv.reader(fin)            headers = reader.next()            if write_headers:                write_headers = False  # Only write headers once.                writer.writerow(headers)            writer.writerows(reader)  # Write all remaining rows.

CodeHunter

Concatenating multiple csv files into a single csv with the same header - Python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last