Getting all posts from a blog (wordpress or blogger)
What you're looking for is a sitemap.
First of all, you're writing a bot so it's good manners to check the blog's robots.txt file. And lo and behold, you'll often find a sitemap mentioned there. Here's an example from the Google blog:
User-agent: Mediapartners-GoogleDisallow: User-agent: *Disallow: /searchAllow: /Sitemap: http://googleblog.blogspot.com/feeds/posts/default?orderby=UPDATED
In this case, you can visit the Sitemap URL to get an xml sitemap.
For Wordpress, the same applies but it's not built-in as standard so not all blogs will have it. Have a look at this plugin which is the most popular way to create these sitemaps in Wordpress. For example, my blog uses this and you can find the sitemap at /sitemap.xml(the standard location)
In short:
- Check robots.txt
- Follow the Sitemap url if it's present
- Otherwise, check for /sitemap.xml
Also: be a good Internet citizen! If you're going to write a bot, make sure it obeys the robots.txt file (like where blogspot tells you explicitly not to use /search
!)