What is the best way to run Lucene/Solr on Hadoop? What is the best way to run Lucene/Solr on Hadoop? hadoop hadoop

What is the best way to run Lucene/Solr on Hadoop?


Take a look at ElasticSearch. You can index to ElasticSearch from Hadoop for bulk loading. Infochimps has open sourced an ElasticSearch bulk indexer called Wonderdog that you can look at for a proof of concept.

https://github.com/infochimps/wonderdoghttp://www.elasticsearch.com

It's cloud friendly (See cloud-aws plugin for discovery), and can scale up / down by adding nodes to hold the index.


Is your index sharded? You could shard the index and distribute shards across several instances.