Message broker vs. database and monitoring Message broker vs. database and monitoring database database

Message broker vs. database and monitoring


The scenario described in the question is that of a system, which is composed of multiple different pieces which work together to achieve a function. In this case, you have three different processes {A,B,C}, together with a database and optional message queue. All systems, as part of their purpose of being, accept one or more inputs, execute some process, and produce one or more outputs. In your case, one of your outputs desired is the state of the system and its processing, which is not an altogether unreasonable thing to want to have.

Queue or Database?

Now, down to your question. Why use a message queue instead of a database? Both are similar components of a system in that they perform some storage capacity. You might well ask the same question in a refrigerator manufacturing plant- when does it make more sense to use a shelf on the assembly line as opposed to a warehouse?

Databases are like warehouses - they are designed to hold a lot of different things and keep them all relatively straight. A good warehouse allows users to find things in the warehouse quickly, and avoids losing parts and materials. If it goes in, it can easily come back out, but not instantly.

Message queues, on the other hand, are like the shelves located near the operator stations in an assembly line. Parts accumulate there from the previous operation waiting to be consumed by the person running the station. The shelves are designed to hold a small number of the same thing - just like a message queue in a software system. They are close to the worker, so when the next part is ready to be worked, it can be retrieved very quickly (as opposed to a trip to the warehouse, which can take several minutes or more). In addition, the worker has immediate visibility to what's on the shelf - if the shelf is empty, the worker might take a break and wait for it to accumulate a part or two again.

Finally, if one part of the factory grossly over-produces (we don't like it when this happens, because it indicates waste), then the shelves are going to be overwhelmed, and the overage is going to need to be put into the warehouse. Believe it or not, this happens all the time in factories - sometimes stations go down for brief periods of time and the warehouse acts as a longer-term buffer.

When to use one or the other?

So - back to the question. You use a message queue when you expect that your production of messages will usually match the consumption of messages, and you need speed in retrieval. You don't expect things to stay around in the queue very long. Software queue systems, such as RabbitMq, also perform some very specific functions - like ensuring that a job only gets handled by one processor, and that it can get picked up by a different processor if the first goes down.

On the other hand, you would use a database for things which require the persistence of state across multiple processing steps. Your job status is a perfect example of something that should be stored in the database. To continue the factory analogy - think of that as a report that gets sent back to the production planner when each step is completed. The production planner is going to keep it in a database.

You would also want to use a database when there is a likelihood that the queue will get full, or when it's critical that data not get lost between one job step and another. For example, a manufacturing plant will often store its finished products in the warehouse pending shipment to the customer. Use a database for all long-term (more than a few seconds) storage needs in your application.

Bottom Line

Most scalable software systems will have a need for both queues and databases, and the key is knowing when to use each.

Hopefully this makes some degree of sense.


Disclaimer: I'm the author of cluster-tasks-service - CTS, the proposed solution or otherwise pattern to consider with usage of other relevant tool.

The thing is, that from general architecture perspective your described functionality seemingly needs both types of solutions:

  • the part where Tool B executes as an outcome of the work done by Tool A is a classic in event driven flow. Messages usually more effective in this, but keep reading...
  • the part where you'd like to have some state observation / monitor over the flow definitely poses a requirement to persist the state (well, yes, not even in memory, or at least distributed memory, since you'll want a cluster later on etc)

I'd say that DB based queue would be a solution here. DB based queue definitely suffers from the lower throughput than non-DB based approaches. But, it gives you some benefits like:

  • assured persistence, no tasks/messages lost unless real disaster happens and then the queue is a least of the problems
  • smart tasks management synchronized over DB without any need for a complex system topologies, master/slaves issues, failovers etc
  • it is much easier to get such a queues as an embedded solution - very low operational cost comparatively to the separate Redis/Rabbit/Kafka installations/services

In terms of CTS - which is a cluster aware tasks distributions and management system over DB (provided by the consuming application and running embedded), your problem would be solved by running Tool B as a task enqueued by Tool A at the end of its process with all the relevant data.Meanwhile, Tool C could use the APIs of CTS to check the status of the task/s and visualize them as needed.


You can make Producers and Consumers of the queue update a table in a NoSQL database or an RDBMS. This will allow you to view the status of your requests at any given time. It will also let you take the advantage of pushing the messages without a need for polling.