Storing millions of log files - Approx 25 TB a year Storing millions of log files - Approx 25 TB a year mongodb mongodb

Storing millions of log files - Approx 25 TB a year


Since you dont want queriying features, You can use apache hadoop.

I belive HDFS and HBase will be nice fit for this.

You can see lot of huge storage stories inside Hadoop powered by page


Take a look at Vertica, a columnar database supporting parallel processing and fast queries. Comcast used it to analyze about 15GB/day of SNMP data, running at an average rate of 46,000 samples per second, using five quad core HP Proliant servers. I heard some Comcast operations folks rave about Vertica a few weeks ago; they still really like it. It has some nice data compression techniques and "k-safety redundancy", so they could dispense with a SAN.

Update: One of the main advantages of a scalable analytics database approach is that you can do some pretty sophisticated, quasi-real time querying of the log. This might be really valuable for your ops team.


Have you tried looking at gluster? It is scalable, provides replication and many other features. It also gives you standard file operations so no need to implement another API layer.

http://www.gluster.org/