Scraping and parsing Google search results using Python
There is a twill lib for emulating browser. I used it when had a necessity to login with google email account. While it's a great tool with a great idea, it's pretty old and seems to have a lack of support nowadays (the latest version is released in 2007).It might be useful if you want to retrieve results that require cookie-handling or authentication. Likely that twill
is one of the best choices for that purposes.BTW, it's based on mechanize
.
As for parsing, you are right, BeautifulSoup
and Scrapy
are great. One of the cool things behind BeautifulSoup
is that it can handle invalid HTML (unlike Genshi, for example.)
Have a look at this awesome urllib wrapper for web scraping https://github.com/mattseh/python-web/blob/master/web.py