Python - urllib2 & cookielib
This is not a problem with urllib. That site does some funky stuff. You need to request a couple of stylesheets for it to validate your session id:
import cookielib, urllib2cj = cookielib.CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))# default User-Agent ('Python-urllib/2.6') will *not* workopener.addheaders = [ ('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'), ]stylesheets = [ 'https://www.idcourts.us/repository/css/id_style.css', 'https://www.idcourts.us/repository/css/id_print.css',]home = opener.open('https://www.idcourts.us/repository/start.do')print cjsessid = cj._cookies['www.idcourts.us']['/repository']['JSESSIONID'].value# Note the +=opener.addheaders += [ ('Referer', 'https://www.idcourts.us/repository/start.do'), ]for st in stylesheets: # da trick opener.open(st+';jsessionid='+sessid)search = opener.open('https://www.idcourts.us/repository/partySearch.do')print cj# perhaps need to keep updating the referer...
Not an actual answer (but far too long for a comment); possibly useful to anyone else trying to answer this.
Despite my best attempts, I can't figure this out.
Looking in Firebug, the cookie seems to remain the same (works properly) for Firefox.
I added urllib2.HTTPSHandler(debuglevel=1)
to debug what headers Python is sending, and it does appear to resend the cookie.
I also added all the Firefox request headers to see if that would help (it didn't):
opener.addheaders = [ ('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'), ..]
My test code:
import cookielib, urllib2cj = cookielib.CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), urllib2.HTTPSHandler(debuglevel=1))opener.addheaders = [ ('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), ('Accept-Language', 'en-gb,en;q=0.5'), ('Accept-Encoding', 'gzip,deflate'), ('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'), ('Keep-Alive', '115'), ('Connection', 'keep-alive'), ('Cache-Control', 'max-age=0'), ('Referer', 'https://www.idcourts.us/repository/partySearch.do'),]home = opener.open('https://www.idcourts.us/repository/start.do')print cjsearch = opener.open('https://www.idcourts.us/repository/partySearch.do')print cj
I feel like I'm missing something obvious.