What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB? What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB? mongodb mongodb

What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB?


Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.

Honestly, we're going through this test right now with some serious log data. (and by right now, I mean, a few of us were up late last night running these tests).

To me, here are the two distinguishing feature: ease of use and proven scaling.

Ease of use

  • MongoDB was easy. In a couple of hours I went from blank computer to an active Mongo instance with imported data from MySQL and a few completed map-reduces.
  • In the same period of time, team Cassandra sat around re-compiling Java files trying to get the Hadoop configured to run over an existing Cassandra implementation so that they could even run map-reduces.

Proven Scaling

  • MongoDB sharding is still in beta. It's slated for launch in the next few weeks. That's pretty tight.
  • Cassandra sharding is proven on some very large instances.

So I think the answer is really going to be specific to your personal tastes. I honestly think that Cassandra may be a more stable & proven product, but I also know from experience that the learning and setup curve is a lot steeper. So it might be worth trying a little bit of both.


You can check out this article from Cloudkick if you are considering using Cassandra: 4 Months with Cassandra, a love story.

They are using Cassandra to store different metrics for their system, which is somewhat similar to storing log files.

EDIT:

If you haven't yet decided what to use, here's a great solution using MongoDB as a backend:

Graylog2 is an open source syslog implementation that stores your logs in MongoDB. It consists of a server written in Java that accepts your syslog messages via TCP or UDP and stores it in the database. The second part is a Ruby on Rails web interface that allows you to view the log messages.