How to pass a user defined argument in scrapy spider

Spider arguments are passed in the crawl command using the -a option. For example:

scrapy crawl myspider -a category=electronics -a domain=system

Spiders can access arguments as attributes:

class MySpider(scrapy.Spider):    name = 'myspider'    def __init__(self, category='', **kwargs):        self.start_urls = [f'http://www.example.com/{category}']  # py36        super().__init__(**kwargs)  # python3    def parse(self, response)        self.log(self.domain)  # system

Taken from the Scrapy doc: http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments

Update 2013: Add second argument

Update 2015: Adjust wording

Update 2016: Use newer base class and add super, thanks @Birla

Update 2017: Use Python3 super

# previouslysuper(MySpider, self).__init__(**kwargs)  # python2

Update 2018: As @eLRuLL points out, spiders can access arguments as attributes

python scrapy web-crawler

Previous answers were correct, but you don't have to declare the constructor (__init__) every time you want to code a scrapy's spider, you could just specify the parameters as before:

scrapy crawl myspider -a parameter1=value1 -a parameter2=value2

and in your spider code you can just use them as spider arguments:

class MySpider(Spider):    name = 'myspider'    ...    def parse(self, response):        ...        if self.parameter1 == value1:            # this is True        # or also        if getattr(self, parameter2) == value2:            # this is also True

And it just works.

python scrapy web-crawler

To pass arguments with crawl command

scrapy crawl myspider -a category='mycategory' -a domain='example.com'

To pass arguments to run on scrapyd replace -a with -d

curl http://your.ip.address.here:port/schedule.json -d spider=myspider -d category='mycategory' -d domain='example.com'

The spider will receive arguments in its constructor.

class MySpider(Spider):    name="myspider"    def __init__(self,category='',domain='', *args,**kwargs):        super(MySpider, self).__init__(*args, **kwargs)        self.category = category        self.domain = domain

Scrapy puts all the arguments as spider attributes and you can skip the init method completely. Beware use getattr method for getting those attributes so your code does not break.

class MySpider(Spider):    name="myspider"    start_urls = ('https://httpbin.org/ip',)    def parse(self,response):        print getattr(self,'category','')        print getattr(self,'domain','')

CodeHunter

How to pass a user defined argument in scrapy spider

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last