Scraping data from all asp.net pages with AJAX pagination implemented Scraping data from all asp.net pages with AJAX pagination implemented asp.net asp.net

Scraping data from all asp.net pages with AJAX pagination implemented


In general, in order to fake the ASP.NET web site to think that you actually pressed a button (in more general terms - performed a postback), you need to do the following:

  1. Get the value of every single INPUT and SELECT element on the page. It might not be required in every scenario, but you should always at least get the values of all hidden fields where the name starts with "__" (such as __VIEWSTATE). You don't really need to know what is written in them - just that the value in them has to be sent back to the server unchanged.

  2. Create a POST request to the server. You need to use the classic POST, avoiding any AJAX requests. Using some browser plugins (in Firefox or Chrome) it might be possible to disable XMLHttpRequest so you can then intercept the non-AJAX request with tools like Fiddler.

  3. Add every value from #1 to that post request. There are only two values you need to overwrite: __EVENTTARGET and __EVENTARGUMENT. You would leave those empty except if the link or button that you try to imitate has a onclick handler like <a href="javascript:__doPostBack('ctl00$login','')">. If it is, parse the values from this link - the first one is the event target (it usually will match the ID of some element on the page), the second is the event argument.

  4. If you executed the request correctly, you should get back HTML page. If you get a partial response, check if you didn't pass the HTTP header that asks for async result.


My best advice is to use iMacros https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/

iMacros :

  1. Record your flow of page downloading. http://wiki.imacros.net/First_Steps
  2. Save web page to local directory. http://wiki.imacros.net/SAVEAS
  3. Scrap email, addresses etc using PHP script.

No matter whether it's ajax - .aspx, .jsp or .php.


I would recommend branching out into Ruby and trying Capybara which is a sane way of using Selenium. It lets you do a visit of a page, then examine the actual DOM. You can click on everything, wait for events, etc. It uses a real browser.

visit "http://www.google.com" page.find("button[name=btnK]")