How to request pages from website that uses OpenID? How to request pages from website that uses OpenID? python python

How to request pages from website that uses OpenID?


Well I myself don't know much about OpenID but your post (and the bounty!!) got me interested.

This link tells the exact flow of OpenID authentication sequence (Atleast for v1.0. The new version is 2.0). From what I could make out, the steps would be something like

  1. You fetch the login page of stackoverflow that will also provide an option to login using OpenID (As a form field).
  2. You send ur openID which is actually a form of uri and NOT username/email(If it is Google profile it is your profile ID)
  3. Stackoverflow will then connect to your ID provider (in this case google) and send you a redirect to google login page and another link to where you should redirect later (lets say a)
  4. You can login to the google provided page conventionally (using POST method from Python)
  5. Google provides a cryptographic token (Not pretty sure about this step) in return to your login request
  6. You send the new request to a with this token.
  7. Stackoverflow will contact google with this token. If authenticity established, it will return a session ID
  8. Later requests to STackOverflow should have this session ID
  9. No idea about logging out!!

This link tells about various responses in OpenID and what they mean. So maybe it will come in handy when your code your client.

Links from the wiki page OpenID Explained

EDIT: Using Tamper Data Add on for Firefox, the following sequence of events can be constructed.

  1. User sends a request to the SO login page. On entering the openID in the form field the resulting page sends a 302 redirecting to a google page.The redirect URL has a lot of OpenID parameters (which are for the google server). One of them is return_to=https://stackoverflow.com/users/authenticate/?s=some_value.
  2. The user is presented with the google login page. On login there are a few 302's which redirect the user around in google realm.
  3. Finally a 302 is received which redirects user to stackoverflow's page specified in 'return_to' earlier
  4. During this entire series of operation a lot of cookie's have been generated which must be stored correctly
  5. On accessing the SO page (which was 302'd by google), the SO server processes your request and in the response header sends a field "Set-Cookie" to set cookies named gauth and usr with a value along with another 302 to stackoverflow.com. This step completes your login
  6. Your client simply stores the cookie usr
  7. You are logged in as long as you remeber to send the Cookie usr with any request to SO.
  8. You can now request your inbox just remeber to send the usr cookie with the request.

I suggest you start coding your python client and study the responses carefully. In most cases it will be a series of 302's with minimal user intervention (except for filling out your Google username and password and allowing the site page).

However to make it easier, you could just login to SO using your browser, copy all the cookie values and make a request using urllib2 with the cookie values set.

Of course in case you log out on the browser, you will have to login again and change the cookie value in your python program.


I know this is close to archeology, digging a post that's two years old, but I just wrote a new enhanced version of the code from the validated answer, so I thought it may be cool to share it here, as this question/answers has been a great help for me to implement that.

So, here's what's different:

  • it uses the new requests library that is an enhancement over urllib2 ;
  • it supports authenticating using google's and stackexchange's openid provider.
  • it is way shorter and simpler to read, though it has less printouts

here's the code:

#!/usr/bin/env pythonimport sysimport urllibimport requestsfrom BeautifulSoup import BeautifulSoupdef get_google_auth_session(username, password):    session = requests.Session()    google_accounts_url = 'http://accounts.google.com'    authentication_url = 'https://accounts.google.com/ServiceLoginAuth'    stack_overflow_url = 'http://stackoverflow.com/users/authenticate'    r = session.get(google_accounts_url)    dsh = BeautifulSoup(r.text).findAll(attrs={'name' : 'dsh'})[0].get('value').encode()    auto = r.headers['X-Auto-Login']    follow_up = urllib.unquote(urllib.unquote(auto)).split('continue=')[-1]    galx = r.cookies['GALX']    payload = {'continue' : follow_up,               'followup' : follow_up,               'dsh' : dsh,               'GALX' : galx,               'pstMsg' : 1,               'dnConn' : 'https://accounts.youtube.com',               'checkConnection' : '',               'checkedDomains' : '',               'timeStmp' : '',               'secTok' : '',               'Email' : username,               'Passwd' : password,               'signIn' : 'Sign in',               'PersistentCookie' : 'yes',               'rmShown' : 1}    r = session.post(authentication_url, data=payload)    if r.url != authentication_url: # XXX        print "Logged in"    else:        print "login failed"        sys.exit(1)    payload = {'oauth_version' : '',               'oauth_server' : '',               'openid_username' : '',               'openid_identifier' : ''}    r = session.post(stack_overflow_url, data=payload)    return sessiondef get_so_auth_session(email, password):    session = requests.Session()    r = session.get('http://stackoverflow.com/users/login')    fkey = BeautifulSoup(r.text).findAll(attrs={'name' : 'fkey'})[0]['value']    payload = {'openid_identifier': 'https://openid.stackexchange.com',               'openid_username': '',               'oauth_version': '',               'oauth_server': '',               'fkey': fkey,               }    r = session.post('http://stackoverflow.com/users/authenticate', allow_redirects=True, data=payload)    fkey = BeautifulSoup(r.text).findAll(attrs={'name' : 'fkey'})[0]['value']    session_name = BeautifulSoup(r.text).findAll(attrs={'name' : 'session'})[0]['value']    payload = {'email': email,               'password': password,               'fkey': fkey,               'session': session_name}    r = session.post('https://openid.stackexchange.com/account/login/submit', data=payload)    # check if url changed for error detection    error = BeautifulSoup(r.text).findAll(attrs={'class' : 'error'})    if len(error) != 0:        print "ERROR:", error[0].text        sys.exit(1)    return sessionif __name__ == "__main__":    prov = raw_input('Choose your openid provider [1 for StackOverflow, 2 for Google]: ')    name = raw_input('Enter your OpenID address: ')    pswd = getpass('Enter your password: ')    if '1' in prov:        so = get_so_auth_session(name, pswd)    elif '2' in prov:        so = get_google_auth_session(name, pswd)    else:        print "Error no openid provider given"    r = so.get('http://stackoverflow.com/inbox/genuwine')    print r.json()

the code is also available as a github gist

HTH


This answer sums up what others have said below, especially RedBaron, plus adding a method I used to get to the StackOverflow Inbox using Google Accounts.

Using the Tamper Data developer tool of Firefox and logging on to StackOVerflow, one can see that OpenID works this way:

  1. StackOverflow requests authentication from a given service (here Google), defined in the posted data;
  2. Google Accounts takes over and checks for an already existing cookie as proof of authentication;
  3. If no cookie is found, Google requests authentication and sets a cookie;
  4. Once the cookie is set, StackOverflow acknowledges authentication of the user.

The above sums up the process, which in reality is more complicated, since many redirects and cookie exchanges occur indeed.

Because reproducing the same process programmatically proved somehow difficult (and that might just be my illiteracy), especially trying to hunt down the URLs to call with all locale specifics etc. I opted for loging on to Google Accounts first, getting a well deserved cookie and then login onto Stackoverflow, which would use the cookie for authentication.

This is done simply using the following Python modules: urllib, urllib2, cookielib and BeautifulSoup.

Here is the (simplified) code, it's not perfect, but it does the trick. The extended version can be found on Github.

#!/usr/bin/env pythonimport urllibimport urllib2import cookielibfrom BeautifulSoup import BeautifulSoupfrom getpass import getpass# Define URLsgoogle_accounts_url = 'http://accounts.google.com'authentication_url = 'https://accounts.google.com/ServiceLoginAuth'stack_overflow_url = 'https://stackoverflow.com/users/authenticate'genuwine_url = 'https://stackoverflow.com/inbox/genuwine'# Build openerjar = cookielib.CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))def request_url(request):        '''        Requests given URL.    '''         try:        response = opener.open(request)    except:        raise    return responsedef authenticate(username='', password=''):            '''        Authenticates to Google Accounts using user-provided username and password,        then authenticates to StackOverflow.    '''    # Build up headers    user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'    headers = {'User-Agent' : user_agent}    # Set Data to None    data = None    # Build up URL request with headers and data        request = urllib2.Request(google_accounts_url, data, headers)    response = request_url(request)    # Build up POST data for authentication    html = response.read()    dsh = BeautifulSoup(html).findAll(attrs={'name' : 'dsh'})[0].get('value').encode()    auto = response.headers.getheader('X-Auto-Login')    follow_up = urllib.unquote(urllib.unquote(auto)).split('continue=')[-1]    galx = jar._cookies['accounts.google.com']['/']['GALX'].value    values = {'continue' : follow_up,              'followup' : follow_up,              'dsh' : dsh,              'GALX' : galx,              'pstMsg' : 1,              'dnConn' : 'https://accounts.youtube.com',              'checkConnection' : '',              'checkedDomains' : '',              'timeStmp' : '',              'secTok' : '',              'Email' : username,              'Passwd' : password,              'signIn' : 'Sign in',              'PersistentCookie' : 'yes',              'rmShown' : 1}    data = urllib.urlencode(values)    # Build up URL for authentication    request = urllib2.Request(authentication_url, data, headers)    response = request_url(request)    # Check if logged in    if response.url != request._Request__original:        print '\n Logged in :)\n'    else:        print '\n Log in failed :(\n'    # Build OpenID Data        values = {'oauth_version' : '',              'oauth_server' : '',              'openid_username' : '',              'openid_identifier' : 'https://www.google.com/accounts/o8/id'}    data = urllib.urlencode(values)    # Build up URL for OpenID authetication    request = urllib2.Request(stack_overflow_url, data, headers)    response = request_url(request)    # Retrieve Genuwine    data = None    request = urllib2.Request(genuwine_url, data, headers)    response = request_url(request)    print response.read()if __name__ == '__main__':    username = raw_input('Enter your Gmail address: ')    password = getpass('Enter your password: ')    authenticate(username, password)