How to access scrapy settings from item Pipeline

python scrapy settings pipeline

UPDATE (2021-05-04)
Please note that this answer is now ~7 years old, so it's validity can no longer be ensured. In addition it is using Python2

The way to access your Scrapy settings (as defined in settings.py) from within your_spider.py is simple. All other answers are way too complicated. The reason for this is the very poor maintenance of the Scrapy documentation, combined with many recent updates & changes. Neither in the "Settings" documentation "How to access settings", nor in the "Settings API" have they bothered giving any workable example. Here's an example, how to get your current USER_AGENT string.

Just add the following lines to your_spider.py:

# To get your settings from (settings.py):from scrapy.utils.project import get_project_settings...class YourSpider(BaseSpider):    ...    def parse(self, response):        ...        settings = get_project_settings()        print "Your USER_AGENT is:\n%s" % (settings.get('USER_AGENT'))        ...

As you can see, there's no need to use @classmethod or re-define the from_crawler() or __init__() functions. Hope this helps.

PS. I'm still not sure why using from scrapy.settings import Settings doesn't work the same way, since it would be the more obvious choice of import?

python scrapy settings pipeline

Ok, so the documentation at http://doc.scrapy.org/en/latest/topics/extensions.html says that

The main entry point for a Scrapy extension (this also includes middlewares and pipelines) is the from_crawler class method which receives a Crawler instance which is the main object controlling the Scrapy crawler. Through that object you can access settings, signals, stats, and also control the crawler behaviour, if your extension needs to such thing.

So then you can have a function to get the settings.

@classmethoddef from_crawler(cls, crawler):    settings = crawler.settings    my_setting = settings.get("MY_SETTING")    return cls(my_setting)

The crawler engine then calls the pipeline's init function with my_setting, like so:

def __init__(self, my_setting):    self.my_setting = my_setting

And other functions can access it with self.my_setting, as expected.

Alternatively, in the from_crawler() function you can pass the crawler.settings object to __init__(), and then access settings from the pipeline as needed instead of pulling them all out in the constructor.

python scrapy settings pipeline

The correct answer is: it depends where in the pipeline you wish to access the settings.

avaleske has answered as if you wanted access to the settings outside of your pipelines process_item method but it's very likely this is where you'll want the setting and therefore there is a much easier way as the Spider instance itself gets passed in as an argument.

class PipelineX(object):    def process_item(self, item, spider):         wanted_setting = spider.settings.get('WANTED_SETTING')

CodeHunter

How to access scrapy settings from item Pipeline

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last