Which is best in Python: urllib2, PycURL or mechanize?

python urllib2 mechanize pycurl

I think this talk (at pycon 2009), has the answers for what you're looking for (Asheesh Laroia has lots of experience on the matter). And he points out the good and the bad from most of your listing

Scrape the Web: Strategies for programming websites that don't expect it (Part 1 of 3)
Scrape the Web: Strategies forprogramming websites that don't expect it (Part 2 of 3)
Scrape the Web: Strategies forprogramming websites that don'texpect it (Part 3 of 3)

From the PYCON 2009 schedule:

Do you find yourself faced with websites that have data you need to extract? Would your life be simpler if you could programmatically input data into web applications, even those tuned to resist interaction by bots?
We'll discuss the basics of web scraping, and then dive into the details of different methods and where they are most applicable.
You'll leave with an understanding of when to apply different tools, and learn about a "heavy hammer" for screen scraping that I picked up at a project for the Electronic Frontier Foundation.
Atendees should bring a laptop, if possible, to try the examples we discuss and optionally take notes.

Update:Asheesh Laroia has updated his presentation for pycon 2010

PyCon 2010: Scrape the Web:Strategies for programming websitesthat don't expected it

* My motto: "The website is the API."* Choosing a parser: BeautifulSoup, lxml, HTMLParse, and html5lib.* Extracting information, even in the face of bad HTML: Regular expressions, BeautifulSoup, SAX, and XPath.* Automatic template reverse-engineering tools.* Submitting to forms.* Playing with XML-RPC* DO NOT BECOME AN EVIL COMMENT SPAMMER.* Countermeasures, and circumventing them:      o IP address limits      o Hidden form fields      o User-agent detection      o JavaScript      o CAPTCHAs * Plenty of full source code to working examples:      o Submitting to forms for text-to-speech.      o Downloading music from web stores.      o Automating Firefox with Selenium RC to navigate a pure-JavaScript service. * Q&A; and workshopping* Use your power for good, not evil.

Update 2:

PyCon US 2012 - Web scraping: Reliably and efficiently pull data from pages that don't expect it

Exciting information is trapped in web pages and behind HTML forms. In this tutorial, >you'll learn how to parse those pages and when to apply advanced techniques that make >scraping faster and more stable. We'll cover parallel downloading with Twisted, gevent, >and others; analyzing sites behind SSL; driving JavaScript-y sites with Selenium; and >evading common anti-scraping techniques.

python urllib2 mechanize pycurl

Python requests is also a good candidate for HTTP stuff. It has a nicer api IMHO, an example http request from their offcial documentation:

>>> r = requests.get('https://api.github.com', auth=('user', 'pass'))>>> r.status_code204>>> r.headers['content-type']'application/json'>>> r.content...

python urllib2 mechanize pycurl

urllib2 is found in every Python install everywhere, so is a good base upon which to start.
PycURL is useful for people already used to using libcurl, exposes more of the low-level details of HTTP, plus it gains any fixes or improvements applied to libcurl.
mechanize is used to persistently drive a connection much like a browser would.

It's not a matter of one being better than the other, it's a matter of choosing the appropriate tool for the job.

CodeHunter

Which is best in Python: urllib2, PycURL or mechanize?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last