Python urllib2 Response header
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
import urllib2request = urllib2.Request('http://your.tld/...')request.add_header('User-Agent', 'some fake agent string')request.add_header('Referer', 'fake referrer')...response = urllib2.urlopen(request)# check content type:print response.info().getheader('Content-Type')
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
Content-Type text/html
Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv
.
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).