Scrapy: How to output items in a specific json format Scrapy: How to output items in a specific json format json json

Scrapy: How to output items in a specific json format


This is well documented at scrapy web page here.

from scrapy.exporters import JsonItemExporterclass ItemPipeline(object):    file = None    def open_spider(self, spider):        self.file = open('item.json', 'w')        self.exporter = JsonItemExporter(self.file)        self.exporter.start_exporting()    def close_spider(self, spider):        self.exporter.finish_exporting()        self.file.close()    def process_item(self, item, spider):        self.exporter.export_item(item)        return item

This will create a json file containing your items.


I was trying to export pretty printed JSON and this is what worked for me.

I created a pipeline that looked like this:

class JsonPipeline(object):    def open_spider(self, spider):        self.file = open('your_file_name.json', 'wb')        self.file.write("[")    def close_spider(self, spider):        self.file.write("]")        self.file.close()    def process_item(self, item, spider):        line = json.dumps(            dict(item),            sort_keys=True,            indent=4,            separators=(',', ': ')        ) + ",\n"        self.file.write(line)        return item

It's similar to the example from the scrapy docs https://doc.scrapy.org/en/latest/topics/item-pipeline.html except it prints each JSON property indented and on a new line.

See the part about pretty printing here https://docs.python.org/2/library/json.html