Parsing complex HTML tables Parsing complex HTML tables php php

Parsing complex HTML tables


You could make use of a browsers rendering/layout engine here.

Use http://phantomjs.org/ to get access to a headless browser that lets you execute javascript on a webpage's dom.

A dash of jquery would make the remaining pseudocode easy to implement:

foreach (td.t as dateElement) {    //parse date from element text    //use pixel position + dimensions to calc pixel coord of center    // save this center in a list along with the date}foreach (td.v as calendarEntryElement) {    //parse time + other stuff from element text    //use pixel position to find the closest date element in that list(it must be the closest one above)}

I feel positional information would be very reliable here, because everything is a nested rectangle and its all done via tables.

You don't need to use phantomjs, you could just as easily execute a browser manually, and let it send a request to a local server to collect the results.

Some shell command roughly like

firefox file://foo123.html

Where you've appended some custom <script> to the end of one of their webpages and saved it.



I study at the same university and a few weeks ago I faced the same problem to parse this time table and convert it to an ICS file. Finally I found my own solution and generalized the code, so that students from other universities, using the Sked software and have a much more complex time table, can import their time table too.
I also created a website, where students can sign up and configure the urls to the time tables which they want to subscribe. In the background runs a cronjob which ensures, that the subscribed calendars are always up to date.You can find the result of the project on my website:
http://calendar.pineappledeveloper.com/
(it is only in German available).