Is Scrapy compatible with multiprocessing? Is Scrapy compatible with multiprocessing? selenium selenium

Is Scrapy compatible with multiprocessing?


The recommended way for working with scrapy is to NOT use multiprocessing inside the running spiders.

The better alternative would be to invoke several scrapy jobs with the respective separated inputs.

Scrapy jobs themselves are very fast IMO, of course, you can always go faster, special settings as you mentioned CONCURRENT_REQUESTS, CONCURRENT_REQUESTS_PER_DOMAIN, DOWNLOAD_DELAY, etc. But this is basically because scrapy is asynchronous, meaning it won't wait for the requests to be completed to schedule and continue working on the remaining tasks (scheduling more requests, parsing responses, etc.)

The CONCURRENT_REQUESTS doesn't have a connection with multiprocessing. It is mostly a way to "limit" the speed of how many requests could be scheduled, because of being asynchronous.


You can use:

If you need more than that or you have some heavy processing, I suggest that you move this part in a separate process.

Scrapy's responsibility is web parsing, you could for example, in an item pipeline, send tasks to a queue and have a separate process consume and process tasks.


Well, typically speaking, scrapy don't support multiprocess, see

ReactorNotRestartable error in while loop with scrapy

For a particular process once you call reactor.run() or process.start() you cannot rerun those commands. The reason is the reactor cannot be restarted. The reactor will stop execution once the script completes the execution.

But, there is some way to workaround.

    pool = Pool(processes=pool_size,maxtasksperchild=1)

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process.

since the maxtasksperchild is set to 1, so the subprocess will be killed after task finished, a new subprocess will be created and no need to restart task.

But this will cause memory pressure, make sure you do need it.I think start multiply process is a better choice.


I am new to scrapy, so if you have any better suggestions, plz tell me.