Using R to add field to online form and scrape resulting javascript created table Using R to add field to online form and scrape resulting javascript created table r r

Using R to add field to online form and scrape resulting javascript created table


I'm not sure what the T&C of the VOA website have to say about scraping, but this code will do the job:

library("httr")library("rvest")post_code <- "B1 1"resp <- POST("http://cti.voa.gov.uk/cti/InitS.asp?lcn=0",             encode = "form",             body = list(btnPush = 1,                         txtPageNum = 0,                         txtPostCode = post_code,                         txtRedirectTo = "InitS.asp",                         txtStartKey = 0))resp_cont <- read_html(resp)council_table <- resp_cont %>%  html_node(".scl_complex table") %>%  html_table

Firebug has an excellent 'Net' panel where the POST headers can be seen. Most modern browsers also have something similar built in.


I use RSelenium to scrap a council tax list of an Exeter postcode:

library(RSelenium)library(RCurl)input = 'EX4 2NU'appURL <- "http://cti.voa.gov.uk/cti/"RSelenium::startServer()remDr <- remoteDriver()remDr$open()Sys.sleep(5)remDr$navigate(appURL)search.form <- remDr$findElement(using = "xpath", "//*[@id='txtPostCode']")search.form$sendKeysToElement(list(input, key = "enter"))doc <- remDr$getPageSource()tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody')temp1 = readHTMLTable(tbl[[1]],header=F)v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))while (v != 0) {    nextpage <- remDr$findElement(using = "xpath", "//*[@class = 'next']")    nextpage$clickElement()    doc <- remDr$getPageSource()    tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody')    temp2 = readHTMLTable(tbl[[1]],header=F)    temp1 = rbind(temp1,temp2)    v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))}finaltable = temp1

Hope you find it helpful. With this one you can scrap multiple page data.