Scrapy and response status code: how to check against it? Scrapy and response status code: how to check against it? python python

Scrapy and response status code: how to check against it?


http://readthedocs.org/docs/scrapy/en/latest/topics/spider-middleware.html#module-scrapy.contrib.spidermiddleware.httperror

Assuming default spider middleware is enabled, response codes outside of the 200-300 range are filtered out by HttpErrorMiddleware. You can tell the middleware you want to handle 404s by setting the handle_httpstatus_list attribute on your spider.

class TothegoSitemapHomesSpider(SitemapSpider):    handle_httpstatus_list = [404]


Only to have a complete response here:

  • Set Handle_httpstatus_list = [302];

  • On request, set dont_redirect to True on meta.

For example: Request(URL, meta={'dont_redirect': True});