Regular expression - Ruby vs Perl
regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/)f = File.open( ARGV.shift ).each do |line| if regex .match(line) puts "#{$1}: #{$2}" endend
Or
regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/)f = File.open( ARGV.shift )f.each_line do |line| if regex.match(line) puts "#{$1}: #{$2}" end
One possible difference is the amount of backtracking being performed. Perl might do a better job of pruning the search tree when backtracking (i.e. noticing when part of a pattern can't possibly match). Its regex engine is highly optimised.
First, adding a leading «^
» could make a huge difference. If the pattern doesn't match starting at position 0, it's not going to match at starting position 1 either! So don't try to match at position 1.
Along the same lines, «.*?
» isn't as limiting as you might think, and replacing each instance of it with a more limiting pattern could prevent a lot of backtracking.
Why don't you try:
/ ^ (.*?) [ ]\| (?:(?!SENDING[ ]REQUEST).)* SENDING[ ]REQUEST (?:(?!TID=).)* TID= ([^,]*) ,/x
(Not sure if it was safe to replace the first «.*?
» with «[^|]
», so I didn't.)
(At least for patterns that match a single string, (?:(?!PAT).)
is to PAT
as [^CHAR]
is to CHAR
.)
Using /s
could possibly speed things up if «.
» is allowed to match newlines, but I think it's pretty minor.
Using «\space
» instead of «[space]
» to match a space under /x
might be slightly faster in Ruby. (They're the same in recent versions of Perl.) I used the latter because it's far more readable.
From the perlretut chapter: Using regular expressions in Perl section - "Search and replace"
(Even though the regular expression appears in a loop, Perl is smart enough to compile it only once.)
I don't know Ruby very good, but I suspect that it does compile the regex in each cycle.
(Try the code from LaGrandMere's answer to verfiy it).