How to use Elastic Search on top of a pre-existing SQL Database?
I am using jdbc-river w/ mysql. It is very fast. You can configure them to continually poll data, or use one-time (one-shot strategy) imports.
e.g.
curl -xPUT http://es-server:9200/_river/my_river/_meta -d '{ "type" : "jdbc", "jdbc" : { "strategy" : "simple", "poll" : "5s", "scale" : 0, "autocommit" : false, "fetchsize" : 10, "max_rows" : 0, "max_retries" : 3, "max_retries_wait" : "10s", "driver" : "com.mysql.jdbc.Driver", "url" : "jdbc:mysql://mysql-server:3306/mydb", "user" : "root", "password" : "password*", "sql" : "select c.id, c.brandCode, c.companyCode from category c" }, "index" : { "index" : "mainIndex", "type" : "category", "bulk_size" : 30, "max_bulk_requests" : 100, "index_settings" : null, "type_mapping" : null, "versioning" : false, "acknowledge" : false }}'
If you need a more performant and scalable solution to the polling offered by jdbc-river, I recommend that you watch this presentation that explains how to perform incremental syncing from SQL Server into Elastic Search:
The principles discussed in the video also apply for other RDBMS -> NoSQL replication applications.