How to use powershell to reorder CSV columns How to use powershell to reorder CSV columns powershell powershell

How to use powershell to reorder CSV columns


Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')

$reader = [System.IO.File]::OpenText('data1.csv')$writer = New-Object System.IO.StreamWriter 'data2.csv'for(;;) {    $line = $reader.ReadLine()    if ($null -eq $line) {        break    }    $data = $line.Split(";")    $writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])}$reader.Close()$writer.Close()


Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv


Edit: Benchmarking info below.

I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader or Microsoft.VisualBasic.FileIO.TextFieldParser for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter to write it back out. The TextFieldParser internally uses a StreamReader, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).

I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.

Here's C# for a simple version, assuming no quoted fields and ASCII encoding:

static void Main(){    string source = @"D:\test.csv";    string dest = @"D:\test2.csv";    using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) {        using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) {            reader.SetDelimiters( ";" );            while ( !reader.EndOfData ) {                var fields = reader.ReadFields();                swap(fields, 1, 2);                writer.WriteLine( string.Join( ";", fields ) );            }        }    }}static void swap( string[] arr, int a, int b ) {    string t = arr[ a ];    arr[ a ] = arr[ b ];    arr[ b ] = t;}

Here's the Powershell version:

[void][reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic")$source = 'D:\test.csv'$dest = 'D:\test2.csv'$reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source$writer = new-object System.IO.StreamWriter $destfunction swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}$reader.SetDelimiters(';')while ( !$reader.EndOfData ) {    $fields = $reader.ReadFields()    swap $fields 1 2    $writer.WriteLine([string]::join(';', $fields))}$reader.close()$writer.close()

I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).

Edit: Why mine take so darn long.

The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser more than halved the remaining to under 4 minutes.

However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.