How to login and then download a file from aspx web pages with R
Beside storing the cookie after authentication (see my above comment) there was another problematic point in your solution: the ASP.net site sets a VIEWSTATE
key-value pair in the cookie which is to be reserved in your queries - if you check, you could not even login in your example (the result of the POST
command holds info about how to login, just check it out).
An outline of a possible solution:
Load
RCurl
package:> library(RCurl)
Set some handy
curl
options:> curl = getCurlHandle()> curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, curl = curl)
Load the page for the first time to capture
VIEWSTATE
:> html <- getURL('http://simba.isr.umich.edu/u/Login.aspx', curl = curl)
Extract
VIEWSTATE
with a regular expression or any other tool:> viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))
Set the parameters as your username, password and the
VIEWSTATE
:> params <- list( 'ctl00$ContentPlaceHolder3$Login1$UserName' = '<USERNAME>', 'ctl00$ContentPlaceHolder3$Login1$Password' = '<PASSWORD>', 'ctl00$ContentPlaceHolder3$Login1$LoginButton' = 'Log In', '__VIEWSTATE' = viewstate )
Log in at last:
> html = postForm('http://simba.isr.umich.edu/u/Login.aspx', .params = params, curl = curl)
Congrats, now you are logged in and
curl
holds the cookie verifying that!Verify if you are logged in:
> grepl('Logout', html)[1] TRUE
So you can go ahead and download any file - just be sure to pass
curl = curl
in all your queries.