MongoDB Approaches for storing large amounts of metrics / analytics data MongoDB Approaches for storing large amounts of metrics / analytics data mongodb mongodb

MongoDB Approaches for storing large amounts of metrics / analytics data


Updated answer

Hacked together in the mongo shell:

use pagestats;// a little helper functionvar pagePerHour = function(pagename) {    d = new Date();    return {        page : pagename,        year: d.getUTCFullYear(),        month: d.getUTCMonth(),        day : d.getUTCDate(),        hour: d.getUTCHours(),    }}// a pageview happeneddb.pagestats.update(    pagePerHour('Hello'),    { $inc : { views : 1 }},    true ); //we want to upsert// somebody tweeted our page twice!db.pagestats.update(    pagePerHour('Hello'),    { $inc : { tweets : 2 }},    true ); //we want to upsertdb.pagestats.find();// { "_id" : ObjectId("4dafe88a02662f38b4a20193"),//   "year" : 2011, "day" : 21, "hour" : 8, "month" : 3,//   "page" : "Hello",//   "tweets" : 2, "views" : 1 }// 24 hour summary 'Hello' on 2011-4-21for(i = 0; i < 24; i++) {    //careful: days (1-31), month (0-11) and hours (0-23)    stats = db.pagestats.findOne({ page: 'Hello', year: 2011, month: 3, day : 21, hour : i})    if(stats) {        print(i + ': ' + stats.views + ' views')    } else {        print(i + ': no hits')    };}

Depending on which aspects you want to track you might consider adding more collections (e.g. a collection for user centric tracking). Hope that helps.

See also

Blogpost about Analytics Data


I wouldn't worry too much about space, Mongo can scale pretty much infinitely in that regard, adding more space would be reasonably cheap.

One thing to be aware of is the fact that if you keep updating a document its size will grow, which means Mongo will eventually need to find a new place for it in the index. If you have a lot of documents being updated and increasing in size Mongo will need to copy these documents around a lot, this can slow stuff down significantly. Of course this all depends on how much traffic you're expecting.

Based on my experience, go with a simple document format where you don't need to update the documents, it might complicate your querying later on, but you can use map/reduce to get whatever information you want regardless of your document structure (map reduce is very flexible given enough experience you can do anything).