Parsing a multi-line fixed format text file Parsing a multi-line fixed format text file powershell powershell

Parsing a multi-line fixed format text file


What you need is a lexer. Your record is too big to use a single Regex to parse, so you have to write one regex for each line, and a state machine to validate that the lines follows in the right order.

Or you can use a general purpose lexer/parser to generate the code for you. Wikipedia has long list. The Gold parser looks like a good candidate.

I would not try to do the lexing/parsing in PowerShell. I would rather write the code as C# or F# and use the assembly from PowerShell.

Edit: I've just looked at FileHelpers library. You could create a Multirecord Engine with a .NET Type that matches each line in you source record. All you have to do then is parse the result array for valid order and create objects.


I've done similar in powershell, and found that using a regex in a here-string is much easier to work with:

http://mjolinor.wordpress.com/2012/01/05/powershell-multiline-regex-matching/