Running solr index on hadoop

solr hadoop

You haven't explained where does 10hr take? Does it take to extract the data? or does it take just to index the data.

If you are taking long time on the extraction, then you may use hadoop. Solr has a feature called bulk insert. So in your map function you could accumulate 1000s of record and commit for index in one shot to solr for large number of recods. That will optimize your performance alot.

Also what size is your data?

You could collect large number of records in reduce function of map/reduce job. You have to generate proper keys in your map so that large number of records go to single reduce function. In your custom reduce class, initialize solr object in setup/configure method, depending on your hadoop version and then close it in cleanup method.You will have to create a document collection object(in solrNet or solrj) and commit all of them in one single shot.

If you are using hadoop there is other option called katta. You can look over it as well.

solr hadoop

You can write a map reduce job over your hadoop cluster which simply takes each record and sends it to solr over http for indexing. Afaik solr currently doesn't have indexing over cluster of machines, so it would be of worth to look into elastic search if you want to distribute your index also over multiple nodes.

solr hadoop

There is a SOLR hadoop output format which creates a new index in each reducer- so you disteibute your keys according to the indices which you want and then copy the hdfs files into your SOLR instance after the fact.

http://www.datasalt.com/2011/10/front-end-view-generation-with-hadoop/

CodeHunter

Running solr index on hadoop

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last