Using R to add field to online form and scrape resulting javascript created table
I'm not sure what the T&C of the VOA website have to say about scraping, but this code will do the job:
library("httr")library("rvest")post_code <- "B1 1"resp <- POST("http://cti.voa.gov.uk/cti/InitS.asp?lcn=0", encode = "form", body = list(btnPush = 1, txtPageNum = 0, txtPostCode = post_code, txtRedirectTo = "InitS.asp", txtStartKey = 0))resp_cont <- read_html(resp)council_table <- resp_cont %>% html_node(".scl_complex table") %>% html_table
Firebug has an excellent 'Net' panel where the POST headers can be seen. Most modern browsers also have something similar built in.
I use RSelenium to scrap a council tax list of an Exeter postcode:
library(RSelenium)library(RCurl)input = 'EX4 2NU'appURL <- "http://cti.voa.gov.uk/cti/"RSelenium::startServer()remDr <- remoteDriver()remDr$open()Sys.sleep(5)remDr$navigate(appURL)search.form <- remDr$findElement(using = "xpath", "//*[@id='txtPostCode']")search.form$sendKeysToElement(list(input, key = "enter"))doc <- remDr$getPageSource()tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody')temp1 = readHTMLTable(tbl[[1]],header=F)v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))while (v != 0) { nextpage <- remDr$findElement(using = "xpath", "//*[@class = 'next']") nextpage$clickElement() doc <- remDr$getPageSource() tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody') temp2 = readHTMLTable(tbl[[1]],header=F) temp1 = rbind(temp1,temp2) v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))}finaltable = temp1
Hope you find it helpful. With this one you can scrap multiple page data.