Looking for overall review on Hadoop
This is not a specific question, maybe that is why nobody answered until now. Performance on 3-600 nodes cluster can be best analyzed with benchmarks.
However, I found some really interesting articles regarding Hadoop and its implementations in production:
- Hadoop Architecture and its Usage at Facebook
- How Rackspace Now Uses MapReduce And Hadoop To Query Terabytes Of Data
- Some benchmarks are found in the article Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds
- Also, a really interesting blog related to Hadoop
- Another article related to facebook and hadoop is Hive - A Petabyte Scale Data Warehouse using Hadoop
I hope those links will get you started and give you all the info you need.