Elegant structured text file parsing Elegant structured text file parsing ruby ruby

Elegant structured text file parsing


No and in fact, for the specific type of task you describe, I doubt there's a "cleaner" way to do it than regular expressions. It looks like your files have embedded line breaks so typically what we'll do here is make the line your unit of decomposition, applying per-line regexes. Meanwhile, you create a small state machine and use regex matches to trigger transitions in that state machine. This way you know where you are in the file, and what types of character data you can expect. Also, consider using named capture groups and loading the regexes from an external file. That way if the format of your transcript changes, it's a simple matter of tweaking the regex, rather than writing new parse-specific code.


With Perl, you can use Parse::RecDescent

It is simple, and your grammar will be maintainable later on.


You might want to consider a full parser generator.

Regular expressions are good for searching text for small substrings but they're woefully under-powered if you're really interested in parsing the entire file into meaningful data.

They are especially insufficient if the context of the substring is important.

Most people throw regexes at everything because that's what they know. They've never learned any parser generating tools and they end up coding a lot of the production rule composition and semantic action handling that you can get for free with a parser generator.

Regexes are great and all, but if you need a parser they're no substitute.