How to login and then download a file from aspx web pages with R How to login and then download a file from aspx web pages with R r r

How to login and then download a file from aspx web pages with R


Beside storing the cookie after authentication (see my above comment) there was another problematic point in your solution: the ASP.net site sets a VIEWSTATE key-value pair in the cookie which is to be reserved in your queries - if you check, you could not even login in your example (the result of the POST command holds info about how to login, just check it out).

An outline of a possible solution:

  1. Load RCurl package:

    > library(RCurl)
  2. Set some handy curl options:

    > curl = getCurlHandle()> curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, curl = curl)
  3. Load the page for the first time to capture VIEWSTATE:

    > html <- getURL('http://simba.isr.umich.edu/u/Login.aspx', curl = curl)
  4. Extract VIEWSTATE with a regular expression or any other tool:

    > viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))
  5. Set the parameters as your username, password and the VIEWSTATE:

    > params <- list(    'ctl00$ContentPlaceHolder3$Login1$UserName'    = '<USERNAME>',    'ctl00$ContentPlaceHolder3$Login1$Password'    = '<PASSWORD>',    'ctl00$ContentPlaceHolder3$Login1$LoginButton' = 'Log In',    '__VIEWSTATE'                                  = viewstate    )
  6. Log in at last:

    > html = postForm('http://simba.isr.umich.edu/u/Login.aspx', .params = params, curl = curl)

    Congrats, now you are logged in and curl holds the cookie verifying that!

  7. Verify if you are logged in:

    > grepl('Logout', html)[1] TRUE
  8. So you can go ahead and download any file - just be sure to pass curl = curl in all your queries.