Python requests isn't giving me the same HTML as my browser is Python requests isn't giving me the same HTML as my browser is python python

Python requests isn't giving me the same HTML as my browser is


I had a similar issue:

  • Identical headers with Python and through the browser
  • JavaScript definitely ruled out as a cause

To resolve the issue, I ended up swapping out the requests library for urllib.request.

Basically, I replaced:

import requestssession = requests.Session()r = session.get(URL)

with:

import urllib.requestr = urllib.request.urlopen(URL)

and then it worked.

Maybe one of those libraries is doing something strange behind the scenes? Not sure if that's an option for you or not.


I suggest that you're not sending the proper header (or sending it wrong) with your request. That's why you are getting different content. Here is an example of a HTTP request with header:

url = 'https://www.google.co.il/search?q=eminem+twitter'user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'# header variableheaders = { 'User-Agent' : user_agent }# creating requestreq = urllib2.Request(url, None, headers)# getting htmlhtml = urllib2.urlopen(req).read()

If you are sure that you are sending right header, but are still getting different html. You can try to use selenium. It will allows you to work with browser directly (or with phantomjs if your machine doesn't have GUI). With selenium you will be able just to grab html directly from browser.


A lot of the differences I see are showing me that the content is still there, it's just rendered in a different order, sometimes with different spacing.

You could be receiving different content based on multiple different things:

  • Your headers
  • Your user agent
  • The time!
  • The order which the web application decides to render elements on the page, subject to random attribute order as the element may be pulled from an unsorted data source.

If you could include all of your headers at the top of that Diff, then we may be able to make more sense of it.

I suspect that the application chose not to render certain images as they aren't optimized for what it thinks is some kind of robot/mobile device (Python Requests)

On a closer look at the diff, it appears that everything was loaded in both requests, just with a different formatting.