How to pass a user defined argument in scrapy spider How to pass a user defined argument in scrapy spider python python

How to pass a user defined argument in scrapy spider


Spider arguments are passed in the crawl command using the -a option. For example:

scrapy crawl myspider -a category=electronics -a domain=system

Spiders can access arguments as attributes:

class MySpider(scrapy.Spider):    name = 'myspider'    def __init__(self, category='', **kwargs):        self.start_urls = [f'http://www.example.com/{category}']  # py36        super().__init__(**kwargs)  # python3    def parse(self, response)        self.log(self.domain)  # system

Taken from the Scrapy doc: http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments

Update 2013: Add second argument

Update 2015: Adjust wording

Update 2016: Use newer base class and add super, thanks @Birla

Update 2017: Use Python3 super

# previouslysuper(MySpider, self).__init__(**kwargs)  # python2

Update 2018: As @eLRuLL points out, spiders can access arguments as attributes


Previous answers were correct, but you don't have to declare the constructor (__init__) every time you want to code a scrapy's spider, you could just specify the parameters as before:

scrapy crawl myspider -a parameter1=value1 -a parameter2=value2

and in your spider code you can just use them as spider arguments:

class MySpider(Spider):    name = 'myspider'    ...    def parse(self, response):        ...        if self.parameter1 == value1:            # this is True        # or also        if getattr(self, parameter2) == value2:            # this is also True

And it just works.


To pass arguments with crawl command

scrapy crawl myspider -a category='mycategory' -a domain='example.com'

To pass arguments to run on scrapyd replace -a with -d

curl http://your.ip.address.here:port/schedule.json -d spider=myspider -d category='mycategory' -d domain='example.com'

The spider will receive arguments in its constructor.

class MySpider(Spider):    name="myspider"    def __init__(self,category='',domain='', *args,**kwargs):        super(MySpider, self).__init__(*args, **kwargs)        self.category = category        self.domain = domain

Scrapy puts all the arguments as spider attributes and you can skip the init method completely. Beware use getattr method for getting those attributes so your code does not break.

class MySpider(Spider):    name="myspider"    start_urls = ('https://httpbin.org/ip',)    def parse(self,response):        print getattr(self,'category','')        print getattr(self,'domain','')