RabbitMQ: What Does Celery Offer That Pika Doesn't? RabbitMQ: What Does Celery Offer That Pika Doesn't? python python

RabbitMQ: What Does Celery Offer That Pika Doesn't?


What pika provides is just a small piece of what Celery is doing. Pika is Python library for interacting with RabbitMQ. RabbitMQ is a message broker; at its core, it just sends messages to/receives messages from queues. It can be used as a task queue, but it could also just be used to pass messages between processes, without actually distributing "work".

Celery implements an distributed task queue, optionally using RabbitMQ as a broker for IPC. Rather than just providing a way of sending messages between processes, it's providing a system for distributing actual tasks/jobs between processes. Here's how Celery's site describes it:

Task queues are used as a mechanism to distribute work across threads or machines.

A task queue’s input is a unit of work, called a task, dedicated worker processes then constantly monitor the queue for new work to perform.

Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task a client puts a message on the queue, the broker then delivers the message to a worker.

A Celery system can consist of multiple workers and brokers, giving way to high availability and horizontal scaling.

Celery has a whole bunch of functionality built-in that is outside of pika's scope. You can take a look at the Celery docs to get an idea of the sort of things it can do, but here's an example:

>>> from proj.tasks import add>>> res = add.chunks(zip(range(100), range(100)), 10)()>>> res.get()[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [20, 22, 24, 26, 28, 30, 32, 34, 36, 38], [40, 42, 44, 46, 48, 50, 52, 54, 56, 58], [60, 62, 64, 66, 68, 70, 72, 74, 76, 78], [80, 82, 84, 86, 88, 90, 92, 94, 96, 98], [100, 102, 104, 106, 108, 110, 112, 114, 116, 118], [120, 122, 124, 126, 128, 130, 132, 134, 136, 138], [140, 142, 144, 146, 148, 150, 152, 154, 156, 158], [160, 162, 164, 166, 168, 170, 172, 174, 176, 178], [180, 182, 184, 186, 188, 190, 192, 194, 196, 198]]

This code wants to add every x+y where x is in range(0, 100) and y is in range(0,100). It does this by taking a task called add, which adds two numbers, and distributing the work of adding 1+1, 2+2, 3+3, etc, into chunks of 10, and distributing each chunk to as many Celery workers as there are available. Each worker will run add on its 10 item chunk, until all the work is complete. Then the results are gathered up by the res.get() call. I'm sure you can imagine a way to do this using pika, but I'm sure you can also imagine how much work would be required. You're getting that functionality out of the box with Celery.

You can certainly use pika to implement a distributed task queue if you want, especially if you have a fairly simple use-case. Celery is just providing a "batteries included" solution for task scheduling, management, etc. that you'll have to manually implement if you decide you want them with your pika solution.


I’m going to add an answer here because this is the second time today someone has recommended celery when not needed based on this answer I suspect. So the difference between a distributed task queue and a broker is that a broker just passes messages. Nothing more, nothing less. Celery recommends using RabbitMQ as the default broker for IPC and places on top of that adapters to manage task/queues with daemon processes. While this is useful especially for distributed tasks where you need something generic very quickly. It’s just construct for the publisher/consumer process. Actual tasks where you have defined workflow that you need to step through and ensure message durability based on your specific needs, you’d be better off writing your own publisher/consumer than relying on celery. Obviously you still have to do all of the durability checking etc. With most web related services one doesn’t control the actual “work” units but rather, passes them off to a service. Thus it makes little sense for a distributed tasks queue unless you’re hitting some arbitrary API call limit based on ip/geographical region or account number... Or something along those lines. So using celery doesn’t stop you from having to write or deal with state code or management of workflow etc and it exposes the AMQP in a way that makes it easy for you to avoid writing the constructs of publisher/consumer code.

So in short if you need a simple tasks queue to chew through work and you aren’t really concerned about the nuances of performance, the intricacies of durability through your workflow or the actual publish/consume processes. Celery works. If you are just passing messages to an api or service you don't actually control, sure, you could use Celery but you could just as easily whip up your own publisher/consumer with Pika in a couple of minutes. If you need something robust or that adheres to your own durability scenarios, write your own publish/consumer code like everyone else.