What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB?
Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.
Honestly, we're going through this test right now with some serious log data. (and by right now, I mean, a few of us were up late last night running these tests).
To me, here are the two distinguishing feature: ease of use and proven scaling.
Ease of use
- MongoDB was easy. In a couple of hours I went from blank computer to an active Mongo instance with imported data from MySQL and a few completed map-reduces.
- In the same period of time, team Cassandra sat around re-compiling Java files trying to get the Hadoop configured to run over an existing Cassandra implementation so that they could even run map-reduces.
Proven Scaling
- MongoDB sharding is still in beta. It's slated for launch in the next few weeks. That's pretty tight.
- Cassandra sharding is proven on some very large instances.
So I think the answer is really going to be specific to your personal tastes. I honestly think that Cassandra may be a more stable & proven product, but I also know from experience that the learning and setup curve is a lot steeper. So it might be worth trying a little bit of both.
You can check out this article from Cloudkick
if you are considering using Cassandra
: 4 Months with Cassandra, a love story.
They are using Cassandra
to store different metrics for their system, which is somewhat similar to storing log files.
EDIT:
If you haven't yet decided what to use, here's a great solution using MongoDB
as a backend:
Graylog2 is an open source syslog implementation that stores your logs in MongoDB. It consists of a server written in Java that accepts your syslog messages via TCP or UDP and stores it in the database. The second part is a Ruby on Rails web interface that allows you to view the log messages.