Python Web Crawlers and "getting" html source code

~~Use Python 2.7, is has more 3rd party libs at the moment.~~ (Edit: see below).

I recommend you using the stdlib module urllib2, it will allow you to comfortably get web resources.Example:

import urllib2response = urllib2.urlopen("http://google.de")page_source = response.read()

For parsing the code, have a look at BeautifulSoup.

BTW: what exactly do you want to do:

Just for background, I need to download a page and replace any img with ones I have

Edit: It's 2014 now, most of the important libraries have been ported, and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

python get web-crawler

An Example with python3 and the requests library as mentioned by @leoluk:

pip install requests

Script req.py:

import requestsurl='http://localhost'# in case you need a sessioncd = { 'sessionid': '123..'}r = requests.get(url, cookies=cd)# or without a session: r = requests.get(url)r.content

Now,execute it and you will get the html source of localhost!

python3 req.py

python get web-crawler

If you are using Python > 3.x you don't need to install any libraries, this is directly built in the python framework. The old urllib2 package has been renamed to urllib:

from urllib import requestresponse = request.urlopen("https://www.google.com")# set the correct charset belowpage_source = response.read().decode('utf-8')print(page_source)

CodeHunter

Python Web Crawlers and "getting" html source code

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last