How to use powershell to reorder CSV columns
Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')
$reader = [System.IO.File]::OpenText('data1.csv')$writer = New-Object System.IO.StreamWriter 'data2.csv'for(;;) { $line = $reader.ReadLine() if ($null -eq $line) { break } $data = $line.Split(";") $writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])}$reader.Close()$writer.Close()
Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv
Edit: Benchmarking info below.
I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader
or Microsoft.VisualBasic.FileIO.TextFieldParser
for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter
to write it back out. The TextFieldParser
internally uses a StreamReader
, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).
I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.
Here's C# for a simple version, assuming no quoted fields and ASCII encoding:
static void Main(){ string source = @"D:\test.csv"; string dest = @"D:\test2.csv"; using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) { using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) { reader.SetDelimiters( ";" ); while ( !reader.EndOfData ) { var fields = reader.ReadFields(); swap(fields, 1, 2); writer.WriteLine( string.Join( ";", fields ) ); } } }}static void swap( string[] arr, int a, int b ) { string t = arr[ a ]; arr[ a ] = arr[ b ]; arr[ b ] = t;}
Here's the Powershell version:
[void][reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic")$source = 'D:\test.csv'$dest = 'D:\test2.csv'$reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source$writer = new-object System.IO.StreamWriter $destfunction swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}$reader.SetDelimiters(';')while ( !$reader.EndOfData ) { $fields = $reader.ReadFields() swap $fields 1 2 $writer.WriteLine([string]::join(';', $fields))}$reader.close()$writer.close()
I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).
Edit: Why mine take so darn long.
The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'
-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser
more than halved the remaining to under 4 minutes.
However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.