automating the login to the uk data service website in R with RCurl or httr
The relevant data variables returned by the form are action
and origin
, not combobox
. Give action
the value selection
and origin
the value from the relevant entry in combobox
y <- GET( z$url, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )> y$url[1] "https://shib.data-archive.ac.uk:443/idp/Authn/UserPassword"
Edit
It looks as though the handle pool isn't keeping your session alive correctly. You therefore need to pass the handles directly rather than automatically. Also for the POST
command you need to set multipart=FALSE
as this is the default for HTML forms. The R command has a different default as it is mainly designed for uploading files. So:
y <- GET( handle=z$handle, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )POST(body=values,multipart=FALSE,handle=y$handle)Response [https://www.esds.ac.uk/] Status: 200 Content-type: text/html...snipped... <title> Introduction to ESDS </title> <meta name="description" content="Introduction to the ESDS, home page" />
I think one way to address "enter your organization" page goes like this:
library(tidyverse)library(rvest)library(stringr)org <- "your_organization"user <- "your_username"password <- "your_password"signin <- "http://esds.ac.uk/newRegistration/newLogin.asp"handle_reset(signin)# get to org page and enter orgp0 <- html_session(signin) %>% follow_link("Login")org_link <- html_nodes(p0, "option") %>% str_subset(org) %>% str_match('(?<=\\")[^"]*') %>% as.character()f0 <- html_form(p0) %>% first() %>% set_values(origin = org_link)fake_submit_button <- list(name = "submit-btn", type = "submit", value = "Continue", checked = NULL, disabled = NULL, readonly = NULL, required = FALSE)attr(fake_submit_button, "class") <- "btn-enabled"f0[["fields"]][["submit"]] <- fake_submit_buttonc0 <- cookies(p0)$valuenames(c0) <- cookies(p0)$namep1 <- submit_form(session = p0, form = f0, config = set_cookies(.cookies = c0))
Unfortunately, that doesn't solve the whole problem—(2) is harder than it looks. I've got more of what I think is a solution posted here: R: use rvest (or httr) to log in to a site requiring cookies. Hopefully someone will help us get the rest of the way.