Scrapy: How to output items in a specific json format
This is well documented at scrapy web page here.
from scrapy.exporters import JsonItemExporterclass ItemPipeline(object): file = None def open_spider(self, spider): self.file = open('item.json', 'w') self.exporter = JsonItemExporter(self.file) self.exporter.start_exporting() def close_spider(self, spider): self.exporter.finish_exporting() self.file.close() def process_item(self, item, spider): self.exporter.export_item(item) return item
This will create a json file containing your items.
I was trying to export pretty printed JSON and this is what worked for me.
I created a pipeline that looked like this:
class JsonPipeline(object): def open_spider(self, spider): self.file = open('your_file_name.json', 'wb') self.file.write("[") def close_spider(self, spider): self.file.write("]") self.file.close() def process_item(self, item, spider): line = json.dumps( dict(item), sort_keys=True, indent=4, separators=(',', ': ') ) + ",\n" self.file.write(line) return item
It's similar to the example from the scrapy docs https://doc.scrapy.org/en/latest/topics/item-pipeline.html except it prints each JSON property indented and on a new line.
See the part about pretty printing here https://docs.python.org/2/library/json.html