automating the login to the uk data service website in R with RCurl or httr automating the login to the uk data service website in R with RCurl or httr curl curl

automating the login to the uk data service website in R with RCurl or httr


The relevant data variables returned by the form are action and origin, not combobox. Give action the value selection and origin the value from the relevant entry in combobox

y <- GET( z$url, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )> y$url[1] "https://shib.data-archive.ac.uk:443/idp/Authn/UserPassword"

Edit

It looks as though the handle pool isn't keeping your session alive correctly. You therefore need to pass the handles directly rather than automatically. Also for the POST command you need to set multipart=FALSE as this is the default for HTML forms. The R command has a different default as it is mainly designed for uploading files. So:

y <- GET( handle=z$handle, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )POST(body=values,multipart=FALSE,handle=y$handle)Response [https://www.esds.ac.uk/]  Status: 200  Content-type: text/html...snipped...                    <title>                        Introduction to ESDS                </title>                <meta name="description" content="Introduction to the ESDS, home page" />


I think one way to address "enter your organization" page goes like this:

library(tidyverse)library(rvest)library(stringr)org <- "your_organization"user <- "your_username"password <- "your_password"signin <- "http://esds.ac.uk/newRegistration/newLogin.asp"handle_reset(signin)# get to org page and enter orgp0 <- html_session(signin) %>%     follow_link("Login")org_link <- html_nodes(p0, "option") %>%     str_subset(org) %>%     str_match('(?<=\\")[^"]*') %>%    as.character()f0 <- html_form(p0) %>%    first() %>%    set_values(origin = org_link)fake_submit_button <- list(name = "submit-btn",                           type = "submit",                           value = "Continue",                           checked = NULL,                           disabled = NULL,                           readonly = NULL,                           required = FALSE)attr(fake_submit_button, "class") <- "btn-enabled"f0[["fields"]][["submit"]] <- fake_submit_buttonc0 <- cookies(p0)$valuenames(c0) <- cookies(p0)$namep1 <- submit_form(session = p0, form = f0, config = set_cookies(.cookies = c0))

Unfortunately, that doesn't solve the whole problem—(2) is harder than it looks. I've got more of what I think is a solution posted here: R: use rvest (or httr) to log in to a site requiring cookies. Hopefully someone will help us get the rest of the way.