Regexp to filter a table
this may not be as clean as is could be but it works for this example :)Ruby:
@text = <<END+------------+-------------+-------+-------------+------------+---------------+----------+| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |+------------+-------------+-------+-------------+------------+---------------+----------+| content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl |+------------+-------------+-------+-------------+------------+--------------+----------+| TOTALS AGENTS:21 | total| total| total| total| total|+------------+-------------+-------+-------------+------------+--------------+----------+ENDs = @text.scan(/^[|]\W(.*)[|]$/)puts sarr = []arr2 = []s.each do |o| a = o.to_s.split('|') a.each do |oo| arr2 << oo.to_s.gsub('["','').gsub('"]','').gsub(/\s+/, "") end arr << arr2 arr2 = []endarr.each do |i| puts iend
Check out:
$table = '+------------+-------------+-------+-------------+------------+---------------+----------+| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |+------------+-------------+-------+-------------+------------+---------------+----------+| content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl |+------------+-------------+-------+-------------+------------+--------------+----------+| TOTALS AGENTS:21 | total| total| total| total| total|+------------+-------------+-------+-------------+------------+--------------+----------+';$lines = preg_split('/\r\n|\r|\n/', $table);$array = array();foreach($lines as $line){ if(!preg_match('/\+-+\+/', $line)){ $array[] = preg_split('/\s*\|\s*/', trim($line, '| ')); }}print_r($array);
Output:
Array( [0] => Array ( [0] => HEADING 1 [1] => HEADING 2 [2] => ETC [3] => ANOTHER [4] => HEADING3 [5] => HEADING4 [6] => SML ) [1] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [2] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [3] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [4] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [5] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [6] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [7] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [8] => Array ( [0] => content [1] => more content [2] => cont [3] => More more [4] => content [5] => content 2.0 [6] => litl ) [9] => Array ( [0] => TOTALS AGENTS:21 [1] => total [2] => total [3] => total [4] => total [5] => total ))
Hope this was helpful :)
Here's a complete solution in ruby. You need to manually add a |
to the last line, though.
require 'builder'table = '+------------+-------------+-------+-------------+------------+---------------+----------+| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |+------------+-------------+-------+-------------+------------+---------------+----------+| content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl || content | more content | cont | More more | content | content 2.0 | litl |+------------+-------------+-------+-------------+------------+--------------+----------+| TOTALS AGENTS:21 | total| total| total| total| total|+------------+-------------+-------+-------------+------------+--------------+----------+';def parse_table(table) rows = [] table.each_line do |line| next if line.match /^\+/ rows << line.split(/\s*\|\s*/).reject(&:empty?) end rowsenddef html_row(xml, columns) xml.tr do columns.each do |column| xml.td column end endenddef html_table(rows) head_row = rows.first body_rows = rows[1..-1] xml = Builder::XmlMarkup.new :indent => 2 xml.table do xml.thead do html_row xml, head_row end xml.tbody do body_rows.each do |body_row| html_row xml, body_row end end end.to_sendrows = parse_table(table)html = html_table(rows)puts html
Output:
<table> <thead> <tr> <td>HEADING 1</td> <td>HEADING 2</td> <td>ETC</td> <td>ANOTHER</td> <td>HEADING3</td> <td>HEADING4</td> <td>SML</td> </tr> </thead> <tbody> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>content</td> <td>more content</td> <td>cont</td> <td>More more</td> <td>content</td> <td>content 2.0</td> <td>litl</td> </tr> <tr> <td>TOTALS AGENTS:21</td> <td>total</td> <td>total</td> <td>total</td> <td>total</td> <td>total</td> </tr> </tbody></table>