Best way for a beginner to learn screen scraping by Python [closed] Best way for a beginner to learn screen scraping by Python [closed] python python

Best way for a beginner to learn screen scraping by Python [closed]


I agree that the Scrapy docs give off that impression. But, I believe, as I found for myself, that if you are patient with Scrapy, and go through the tutorials first, and then bury yourself into the rest of the documentation, you will not only start to understand the different parts to Scrapy better, but you will appreciate why it does what it does the way it does it. It is a framework for writing spiders and screen scrappers in the real sense of a framework. You will still have to learn XPath, but I find that it is best to learn it regardless. After all, you do intend to scrape websites, and an understanding of what XPath is and how it works is only going to make things easier for you.

Once you have, for example, understood the concept of pipelines in Scrapy, you will be able to appreciate how easy it is to do all sorts of stuff with scrapped items, including storing them into a database.

BeautifulSoup is a wonderful Python library that can be used to scrape websites. But, in contrast to Scrapy, it is not a framework by any means. For smaller projects where you don't have to invest time in writing a proper spider and have to deal with scrapping a good amount of data, you can get by with BeautifulSoup. But for anything else, you will only begin to appreciate the sort of things Scrapy provides.


Looks like Scrappy is using XPATH for DOM traversal, which is a language itself and may feel somewhat cryptic for some time. I think BeautifulSoup will give you a faster start. With lxml you'll have to invest more time learning, but it generally considered (not only by me) a better alternative to BeautifulSoup.

For database I would suggest you to start with SQLite and use it until you hit a wall and need something more scalable (which may never happen, depending on how far you want to go with that), at which point you'll know what kind of storage you need. Mongodb is definitely overkill at this point, but getting comfortable with SQL is a very useful skill.

Here is a five-line example I gave some time ago to illustrate hoe BeautifulSoup can be used.Which is the best programming language to write a web bot?


I really like BeautifulSoup. I'm fairly new to Python but found it fairly easy to start screen scraping. I wrote a brief tutorial on screen scraping with beautiful soup. I hope it helps.