Viewing "Page Source" shows different HTML than cURL Viewing "Page Source" shows different HTML than cURL google-chrome google-chrome

Viewing "Page Source" shows different HTML than cURL


You browser can execute JavaScript, which can in turn change the document. Curl will just give you the plain original output and nothing else.

If you turn off JavaScript in the browser and refresh the page, you will see that it looks differently.


In addition to just executing JS as explained in the other answer, your browser does a lot more work to fetch that page from the server that you are overlooking, and the server may be reacting based on that.

  • Open Chrome, Press F12, Go to the "Network" Tab.
  • Load the page you want to.
  • Look for the very first thing that got requested (It should be a document icon, with the url below it, you can also sort by 'Timeline' to find it too)
  • Right click on the item, choose 'Copy as cURL'

Paste this into notepad and take a look at what your browser sent to fetch that, vs the simple curl command you did.

curl "http://stackoverflow.com/questions/25333342/viewing-page-source-shows-different-html-than-curl" -H "Accept-Encoding: gzip,deflate,sdch" -H "Accept-Language: en-US,en;q=0.8" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Referer: http://stackoverflow.com/questions?page=2&sort=newest" -H "Cookie: <cookies redacted because lulz>" -H "Connection: keep-alive" -H "Cache-Control: max-age=0" --compressed

Things like the language header sent, and the user agent (more or less what browser and OS you are on), even in some cases if it was requested compressed can all cause a server to generate the page differently. This can be just normal reactions (like giving browser specific html to only that browser, cough*ie and opera*) or part of higher level A/B testing on new designs or functionality. Chances are, the content returned to you see at a URL may likely be different for someone else, or even to you using a different browser or tool.

I also have to point out that what you SEE on the page isnt what comes up with view source. The source is what was sent to your browser to render. What you actually see on the page is something after rendering and Javascript have executed. Most browser support some sort of "Inspect" function on the right click menu, I suggest you take a look at pages through that and compare to what shows in view source, It will change your perspective on how the web works.


Don't know if you have found your answer or not. I have a solution. It could be due to the server throwing 301 etc. The code is straight C, so adapt yourself up.

curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0);curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L); // To see what's happeningcurl_easy_setopt(curl, CURLOPT_USERAGENT, curlversion); // variablecurl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L); // Optional/toggle

The last option needs to be tested with/without to see the exactness in both browser output and curl's.

Also, see the verbose by issuing a direct Shell command

:~$ curl -v http://myurl > page.html

See the difference. It should help.