Google Analytics database [closed] Google Analytics database [closed] database database

Google Analytics database [closed]


AFAIK Google Analytics is derived from Urchin. As it has been said it is possible that since now Analytics is part of the Google family it is using MapReduce/BigTable. I can assume that Google had integrated the old format of Urchin DB with the new BigTable/MapReduce.

I found this links which talk about Urchin DB. Probably some of the things are still in use at the moment.

http://www.advanced-web-metrics.com/blog/2007/10/16/what-is-urchin/

this says:

[snip] ...still use a proprietary database to store reporting data, which makes ad-hoc queries a bit more limited, since you have to use Urchin-developed tools rather than the more flexible SQL tools.

http://www.urchinexperts.com/software/faq/#ques45

What type of database does Urchin use?

Urchin uses a proprietary flat file database for report data storage. The high-performance database architecture handles very high traffic sites efficiently. Some of the benefits of the data base architecture include:

* Small database footprint approximately 5-10% of raw logfile size* Small number of database files required per profile (9 per month of historical reporting)* Support for parallel processing of load-balanced webserver logs for increased performance* Databases are standard files that are easy to back up and restore using native operating system utilitiesv 

More info about Urchin

http://www.google.com/support/urchin45/bin/answer.py?answer=28737

Long time ago I used to have a tracker and on their site they were discussing about data normalization: http://www.2enetworx.com/dev/articles/statisticus5.asp

There you can find a bit of info of how to reduce the data in DB and maybe it is a good start in research.


BigTable

Google Publication: Chang, Fay, et al. "Bigtable: A distributed storage system for structured data."ACM Transactions on Computer Systems (TOCS) 26.2 (2008):

Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth.